SDET-QA Blog: Apache Sqoop Overview & Import Data From MySQL to HDFS

Overview on Sqoop

Sqoop is open source s/w from Apache used for transfer data between RDBMS(Oracle, SQL Server, MySQL) and HDFS.

MySQL Database
Connecting to MySQL Database in cloudera VM:

root user: root/cloudera
other user: cloudera/cloudera

[cloudera@quickstart ~]$ mysql -u root -p

Create new user in MySQL
mysql> create user cloudera1 identified by 'cloudera1';
mysql> grant all privileges on *.* to 'cloudera1';

Listing databases in MySQL
mysql> show databases;

Switching to db
mysql> USE retail_db;

Listing tables in retail_db database
mysql> show tables;

Connecting to Sqoop

Version of Sqoop
[cloudera@quickstart hivedata]$ sqoop version

[cloudera@quickstart hivedata]$ sqoop help

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

import command

import --> importing data from RDBMS to HDFS

1) import data from a table to HDFS(all the rows and columns)
2) import data from a table to HDFS (all the rows but specific columns)
3) import data from a table to HDFS (all the columns, filter the rows by usign where)
4) import data from a table to HDFS (Specific columns,filter the rows by usign where)

Table: trails_db.customers

1) import data from a table to HDFS(all the rows and columns)

[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --m 1;

imported data will be stored in /user/cloudera (default location)

view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/customers/part-m-00000

2) import data from a table to HDFS (all the rows but specific columns)

[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --columns "customer_id","customer_fname","customer_email" --m 1;

imported data will be stored in /user/cloudera (default location)

view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/customers/part-m-00000

3) import data from a table to HDFS (all the columns, filter the rows by usign where)

[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --where "customer_state='IN'" --m 1 --target-dir /user/hive/warehouse/customers

view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/hive/warehouse/customers/part-m-00000

4) import data from a table to HDFS (Specific columns,filter the rows by usign where)

[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://192.168.13.135/retail_db --username cloudera --password cloudera --table customers --columns "customer_id","customer_fname" --where "customer_state='IN'" --m 1 --target-dir /user/hive/warehouse/customers

view the data
[cloudera@quickstart ~]$ hdfs dfs -cat /user/hive/warehouse/customers/part-m-00000

SDET-QA Blog

Pages

Apache Sqoop Overview & Import Data From MySQL to HDFS

Followers

Most Trending

Pages