Sqoop is open source s/w from Apache used for transfer data between RDBMS(Oracle, SQL Server, MySQL) and HDFS.
MySQL Database
Connecting to MySQL Database in cloudera VM:
root user: root/cloudera
other user: cloudera/cloudera
[cloudera@quickstart ~]$ mysql -u root -p
Create new user in MySQL
mysql> create user cloudera1 identified by 'cloudera1';
mysql> grant all privileges on *.* to 'cloudera1';
Listing databases in MySQL
mysql> show databases;
Switching to db
mysql> USE retail_db;
Listing tables in retail_db database
mysql> show tables;
Connecting to Sqoop
Version of Sqoop
[cloudera@quickstart hivedata]$ sqoop version
[cloudera@quickstart hivedata]$ sqoop help
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
import command
import --> importing data from RDBMS to HDFS
1) import data from a table to HDFS(all the rows and columns)
2) import data from a table to HDFS (all the rows but specific columns)
3) import data from a table to HDFS (all the columns, filter the rows by usign where)
4) import data from a table to HDFS (Specific columns,filter the rows by usign where)
Table: trails_db.customers
1) import data from a table to HDFS(all the rows and columns)
[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --m 1;
imported data will be stored in /user/cloudera (default location)
view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/customers/part-m-00000
2) import data from a table to HDFS (all the rows but specific columns)
[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --columns "customer_id","customer_fname","customer_email" --m 1;
imported data will be stored in /user/cloudera (default location)
view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/customers/part-m-00000
3) import data from a table to HDFS (all the columns, filter the rows by usign where)
[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --where "customer_state='IN'" --m 1 --target-dir /user/hive/warehouse/customers
view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/hive/warehouse/customers/part-m-00000
4) import data from a table to HDFS (Specific columns,filter the rows by usign where)
[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --columns "customer_id","customer_fname" --where "customer_state='IN'" --m 1 --target-dir /user/hive/warehouse/customers
view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/hive/warehouse/customers/part-m-00000