This recipe shows how to configure Sqoop to connect with Microsoft SQL Server databases. This will allow data to be efficiently loaded from a Microsoft SQL Server database into HDFS.
This example uses Sqoop v1.3.0.
If you are using CDH3, you already have Sqoop installed. If you are not running CDH3, you can find instructions for your distro at https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation.
This recipe assumes that you have an instance of SQL Server up and running that can connect to your Hadoop cluster.
Complete the following steps to configure Sqoop to connect with Microsoft SQL Server:
This download contains the SQL Server JDBC driver (sqljdbc4.jar
). Sqoop connects to relational databases using JDBC drivers.
gzip -d sqljdbc_3.0.1301.101_enu.tar.gz tar -xvf sqljdbc_3.0.1301.101_enu.tar
This will result in a new folder being created, sqljdbc_3.0
.
sqljdbc4.jar
to $SQOOP_HOME/lib
:cp sqljdbc_3.0/enu/sqljdbc4.jar $SQOOP_HOME/lib
Sqoop now has access to the sqljdbc4.jar
file and will be able to use it to connect to a SQL Server instance.
gzip -d sqoop-sqlserver-1.0.tar.gz tar -xvf sqoop-sqlserver-1.0.tar
This will result in a new folder being
created, sqoop-sqlserver-1.0
.
MSSQL_CONNECTOR_HOME
environment variable:export MSSQL_CONNECTOR_HOME=/path/to/sqoop-sqlserver-1.0
./install.sh
--connect
argument must be changed to --connect jdbc:sqlserver://<HOST>:<PORT>
.Sqoop communicates with databases using JDBC. After adding the sqljdbc4.jar
file to the $SQOOP_HOME/lib
folder, Sqoop will be able to connect to SQL Server instances using --connect jdbc:sqlserver://<HOST>:<PORT>
. In order for SQL Server to have full compatibility with Sqoop, some configuration changes are necessary. The configurations are updated by running the install.sh
script.