To archive this, I followed these steps:
- Create an database within SQL Server 2008 to hold the data called ARCHIVE_DB.
- Download and install ODBC driver for MySQL (available here). Note that due to the version of the MySQL database I was unable to use the newer driver (version 5.1.xx), since this requires version 4.1 of MySQL.
- Set up a linked server in SQL Server 2008 using this driver to point to the data source. I called this data source MYSQL_ls.
- I can now select data from this data source using the "openquery" TSQL syntax, eg:
SELECT * FROM openquery([NADT-SQLHOST], 'select * from webdb.episode')
- I can extend this query using the TSQL "INTO" clause, which will build the SQL table from the source data, eg:
SELECT * INTO ARCHIVE_DB FROM openquery([MYSQL_ls], 'select * from mydatabase.mytable')
- In theory the above will copy all the data from the MySQL database table called
(in the MySQL database called ) into the SQL Server table (SQL Server database ). I do not have to create the table schema, this will be created automatically by the INTO clause. - I can now create an SSIS package that loops through the tables in the MySQL database and runs the SQL statement show in number 5 against each table. This should in theory copy each table across.
MySQL allows datetime values in the format of "0000-00-00", which is the equivalent of a NULL date in SQL server. Unfortunately the "Select * Into From Openquery..." trick above does not deal with this situation, and you will recieve a version of the following error:
Msg 7342, Level 16, State 1, Line 4 An unexpected NULL value was returned for column "[MSDASQL].enddate" from OLE DB provider "MSDASQL" for linked server "MYSQL_ls". This column cannot be NULL.
This error is caused by using "openquery". Fortunately there is a very easy work around for this: where tables have datetimes in "0000-00-00" format, you must use an SSIS data flow task to copy the information, rather than the "openquery" described above:
- Set the data pump source to the MySQL table pointed to in the Linked server and click on "preview" and you will notice that it can return data. This is not possible by running a query in the SQL Management studio, so clearly we are getting somewhere.
- Now set the data pump target to "
" and click on "Mappings" and you will notice that each column in the source table will be mapped to an identically named table in the destination table. - Save the data pump and run.
If you run this data flow, the data will be copied across without reporing the datetime mismatch error.