SSIS Loading multiple files containing a column value within the filename.

I wanted to share with you a nice little SSIS package which will:

  1. Loop through each file in a directory
  2. Determine that the filename consists with what we want to process
  3. Get a value out of the filename, in this case a date, and hold it in a variable
  4. Update or insert an MS SQL table, including writing the date into the table for each row.

Ok, in the first instance you will create a new package in BIDS.

Create a variable, name filename, as a String available for the package.

From the Toolbox, within control flow, drag down a FOR EACH LOOP, double click it and configure it as follows:

In this instance I have setup the example file in C:\Users\MSSQL\Documents\Blog Post\Files. The files are Excel files, in fact in this case they are all called “*names*-ddmmyyyy.xls”. If you wanted you could write syntax to reflect this, I am dealing with this in a script object (next). Before moving on, you need to configure the For Each Loop to write the filename to a variable, this is available under Variable Mappings:

Ok, so we are happy with this. Nice and easy. Within the For Each Loop we are going to place two objects, a Script Task and Data Flow Task. Drag each of these into the For Each Loop and link them together.

Now the script object, in the first instance I am using C# rather than VB.Net. Sorry, I am not a big fan of VB.Net, especially having coded with C# (and Java). Before configuring the Script object, we need to configure another system variable, in this case it is called transactionalDate and is a DateTime:

Right then, we are ready to configure the script object. Double click on it, and populate the ReadOnlyVariables with User::filename and ReadWriteVariables with User::transactionDate.

Now the code!! Select Edit script and recreate this code:

Just to explain, this code extracts the transactionDate out of the filename, converts it to a DateTime object and then writes this back to the User variable we created. Feel free to leave out the MessageBox.Show 🙂

That is the script object done, now the data flow task. Now, the data flow task will consist of a number of tasks, but I will only focus on a few. These being the derived column, and Excel Connection. From the Merge Join down that is performing the insert or update which I have previously covered here.

So in the first instance you need to create an Excel Connection Manager, and point it to the first file (in this case I have three files). Then drag down an Excel Source and configure it as you normally would. Next up is to perform any data conversions to ensure the data from Excel is in the correct format for my table (all pretty straight forward, e.g. converting text fields to DT_STR and setting the correct length).

Now the derived column, if you drag this down and then configure it. Basically you want to make the variable transactionDate available in the flow so it can be written to the table.

As you can see this creates a column called variableStamp which get carried through to the insert / update commands and so the values are written to the table.

So with the dataflow configured as shown you are ready to run it, go for it!

Hang on a minute, how do we process multiple file when the Excel Connection Manager is pointing to one file? Ah ha, this is where it gets exciting!!! You need to go into the properties of the Connection Manager and insert the variable filename into the connectionString.

In the first instance I would recommend making a copy of the current connection string, so you can rollback if things get confusing.

Now, select the current Excel Connection Manager and hit F4 and you should be presented with the Properties window (on the bottom right hand side most likely):

You now need to select Expressions and then the browse button that appears, this will present you with:

Select the Properties box, and a drop down menu appears. Select ConnectionString from this menu and then select the Browse button after Expression:

Ok, now we need to create the connection string to include the variable User::fileName. In my case it was:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\MSSQL\Documents\Blog Post\Files\Names-01122010.xlsx;Extended Properties="EXCEL 12.0;HDR=YES";

And I will change it to:

"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+@[User::fileName]+";Extended Properties=\"EXCEL 12.0;HDR=YES\";"

Note the additional backslashes (\), these are the escape sequences for the quotation marks required around EXCEL 12.0;HDR=YES. Remember to use Evaluate Expression to ensure all is correct.

Click Ok, then Ok again on the expressions editor, and you will notice your connection string has changed.

That is it, we are done! Go run it…. You should see Green everywhere, and the database will contain the dates for the many file you created!

Advertisements

Tags: , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: