This post details different variants of using the Sqoop tool for importing data.
#1 Transfer Entire table content
[root@ODIGettingStarted ~]# sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--username retail_dba \
--password oracle \
--table categories \
The above command will import entire contents of "categories" table into HDFS as a CSV delimited file.
The parameters to sqoop command are -
import - specifies that the data need to be imported into HDFS from relational db
--connect - contains the JDBC URL for the source relational db
--username & --password - credentials to connect to the relational db
--table - name of the table to be transferred to HDFS
When the above command is executed, the contents of the table are imported into HDFS and a file will be created under the directory - /user/$user - as below:
#2 Specifying a Target Directory
Sqoop offers to import contents of a table into a specific HDFS directory. These are :
- --target-dir - allows to specify the final directory where the data will be imported into.
[root@ODIGettingStarted ~]# sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--username retail_dba \
--password oracle \
--table categories \
--target-dir /user/root/output/categories
- --warehouse-dir - allows to specify the root directory under which a new folder will be created while importing
[root@ODIGettingStarted ~]# sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--username retail_dba \
--password oracle \
--table products \
--warehouse-dir /user/root/output
With this variation a parent directory is specified under HDFS. Sqoop import when executed, rather than writing the contents into the parent directory will create a new folder matching the name of the table and imports data in there.
In both instances, sqoop will reject data if the output directory already exists.
#3 Importing only subset of data
Sqoop allows to import only a portion of the table contents into HDFS with a where clause.
[root@ODIGettingStarted ~]# sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--username retail_dba \
--password oracle \
--table products \
--where "substr(product_name,1,2) = 'Sn'" \
--target-dir /user/root/output/products-Sn
The above command will only import the product names starting with "Sn" from "Products" table into HDFS and stores under the specified directory.
#4 File Formats
Sqoop can import data in 3 formats:
- CSV - text - default
- Sequencefile - binary
- Avro - binary
To invoke the sqoop import to store in binary format use below syntax
[root@ODIGettingStarted ~]# sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--username retail_dba \
--password oracle \
--table products \
--as-sequencefile \
--warehouse-dir /user/root/output
[root@ODIGettingStarted ~]# sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--username retail_dba \
--password oracle \
--table products \
--as-avrodatafile \
--warehouse-dir /user/root/output
#5 Importing all tables
Sqoop supports importing all tables in a database, rather than importing tables one by one. To import all tables use the tool - import-all-tables.
[root@ODIGettingStarted ~]# sqoop import-all-tables \
--connect jdbc:mysql://localhost:3306/retail_db \
--username retail_dba \
--password oracle \
--warehouse-dir /user/root/output
When executed, this command will create one folder for each of the table in the database and transfers data into it. The folder structure is as below at the end of this command execution:
mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories |
| customers |
| departments |
| order_items |
| orders |
| products |
+---------------------+
6 rows in set (0.00 sec)
mysql>
After import, the HDFS directory structure is:
[root@ODIGettingStarted conf.dist]# hdfs dfs -ls /user/root/output
Found 6 items
drwxr-xr-x - root supergroup 0 2017-05-23 22:36 /user/root/output/categories
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/customers
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/departments
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/order_items
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/orders
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/products
[root@ODIGettingStarted conf.dist]#
When executed, this command will create one folder for each of the table in the database and transfers data into it. The folder structure is as below at the end of this command execution:
mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories |
| customers |
| departments |
| order_items |
| orders |
| products |
+---------------------+
6 rows in set (0.00 sec)
mysql>
[root@ODIGettingStarted conf.dist]# hdfs dfs -ls /user/root/output
Found 6 items
drwxr-xr-x - root supergroup 0 2017-05-23 22:36 /user/root/output/categories
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/customers
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/departments
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/order_items
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/orders
drwxr-xr-x - root supergroup 0 2017-05-23 22:37 /user/root/output/products
[root@ODIGettingStarted conf.dist]#
No comments:
Post a Comment