Showing posts with label SSIS. Show all posts
Showing posts with label SSIS. Show all posts

Sunday, 21 February 2010

SQL Server - MySQL 4 integration - openquery does not work for tables with datetime values of 0000-00-00

I have been trying copy the contents of an old MySQL 4.0.12 database to SQL Server 2008 in order to archive its contents.

To archive this, I followed these steps:
  1. Create an database within SQL Server 2008 to hold the data called ARCHIVE_DB.
  2. Download and install ODBC driver for MySQL (available here).  Note that due to the version of the MySQL database I was unable to use the newer driver (version 5.1.xx), since this requires version 4.1 of MySQL.
  3. Set up a linked server in SQL Server 2008 using this driver to point to the data source.  I called this data source MYSQL_ls.
  4. I can now select data from this data source using the "openquery" TSQL syntax, eg:


    SELECT *
    FROM openquery([NADT-SQLHOST], 'select * from webdb.episode')
    

  5. I can extend this query using the TSQL "INTO" clause, which will build the SQL table from the source data, eg:


    SELECT * INTO ARCHIVE_DB
    FROM openquery([MYSQL_ls], 'select * from mydatabase.mytable')
    

  6. In theory the above will copy all the data from the MySQL database table called (in the MySQL database called ) into the SQL Server table (SQL Server database ).  I do not have to create the table schema, this will be created automatically by the INTO clause.  
  7. I can now create an SSIS package that loops through the tables in the MySQL database and runs the SQL statement show in number 5 against each table.  This should in theory copy each table across.
This technique works very well until you come to tables that include datetime values.

MySQL allows datetime values in the format of "0000-00-00", which is the equivalent of a NULL date in SQL server.  Unfortunately the "Select * Into From Openquery..." trick above does not deal with this situation, and you will recieve a version of the following error:

Msg 7342, Level 16, State 1, Line 4
An unexpected NULL value was returned for column "[MSDASQL].enddate" from OLE DB provider "MSDASQL" for linked server "MYSQL_ls". This column cannot be NULL.

This error is caused by using "openquery".  Fortunately there is a very easy work around for this:  where tables have datetimes in "0000-00-00" format, you must use an SSIS data flow task to copy the information, rather than the "openquery" described above:

  1. Set the data pump source to the MySQL table pointed to in the Linked server and click on "preview" and you will notice that it can return data.  This is not possible by running a query in the SQL Management studio, so clearly we are getting somewhere.  
  2. Now set the data pump target to "" and click on "Mappings" and you will notice that each column in the source table will be mapped to an identically named table in the destination table.
  3. Save the data pump and run.

If you run this data flow, the data will be copied across without reporing the datetime mismatch error.

Wednesday, 17 February 2010

Use a variable as a table name in an SSIS Execute SQL Task

I spent quite a while today trying to build an SSIS package that iterated through a dataset that contained a set of table names usign a for-each container, then run a piece of SQL against each table.

Setting up the for-each loop was relatively simple, but I couldn't get the Execute SQL task to run a simple query of the type:
Select count(*) from @table
Where @table is the name of the parameter generated from the for each loop.

It appears that this is not possible in SSIS.  Eventually I found a solution:  use another variable to store the SQL statement and set up the SQL task to use this variable as its input source.

Assuming that the input parameter is already set up and called [User::TableName],  then the steps to achive this are:

  1. Set up a new variable (say called "SQLStatement"), make this a string type.
  2. In the properties box of the paremeter, Set EvaluateAsExpression to True
  3. In the properties box of the paremeter, click on the elipses (...) of the "Expression" property to build an expression.  Type in the SQL you need and drag the variable you want to use from the "variables" tree.
  4. Escape you SQL in this expression wwithin quotation marks.  e.g. the count statement above would look like:
  5. "Select count(*) from " + @[User:TableName] 
  6.  You can use the "Evaluate expression" button to make sure that this evaluates as real SQL.  If you are using a "for-each" container then you may want to give your [User::TableName] variable a sensible default value in order to see the results of the evaluated expression in a format you could use for testing.
  7. Drag an "Execute SQL Task" object into your control flow and open its properties page
  8. Set the "SQL Source Type" property to "Variable" to tell it that the SQL will come from a variable, rather from a typed in SQL statement.
  9. Set the "SourceVariable" property to the name of the variable you created in step 1 (in this example it would be "SQLStatement".
That's it!