Archive for the ‘SSIS’ Category

SSIS Loading multiple files containing a column value within the filename.

December 7, 2010

I wanted to share with you a nice little SSIS package which will:

  1. Loop through each file in a directory
  2. Determine that the filename consists with what we want to process
  3. Get a value out of the filename, in this case a date, and hold it in a variable
  4. Update or insert an MS SQL table, including writing the date into the table for each row.

Ok, in the first instance you will create a new package in BIDS.

Create a variable, name filename, as a String available for the package.

From the Toolbox, within control flow, drag down a FOR EACH LOOP, double click it and configure it as follows:

In this instance I have setup the example file in C:\Users\MSSQL\Documents\Blog Post\Files. The files are Excel files, in fact in this case they are all called “*names*-ddmmyyyy.xls”. If you wanted you could write syntax to reflect this, I am dealing with this in a script object (next). Before moving on, you need to configure the For Each Loop to write the filename to a variable, this is available under Variable Mappings:

Ok, so we are happy with this. Nice and easy. Within the For Each Loop we are going to place two objects, a Script Task and Data Flow Task. Drag each of these into the For Each Loop and link them together.

Now the script object, in the first instance I am using C# rather than VB.Net. Sorry, I am not a big fan of VB.Net, especially having coded with C# (and Java). Before configuring the Script object, we need to configure another system variable, in this case it is called transactionalDate and is a DateTime:

Right then, we are ready to configure the script object. Double click on it, and populate the ReadOnlyVariables with User::filename and ReadWriteVariables with User::transactionDate.

Now the code!! Select Edit script and recreate this code:

Just to explain, this code extracts the transactionDate out of the filename, converts it to a DateTime object and then writes this back to the User variable we created. Feel free to leave out the MessageBox.Show 🙂

That is the script object done, now the data flow task. Now, the data flow task will consist of a number of tasks, but I will only focus on a few. These being the derived column, and Excel Connection. From the Merge Join down that is performing the insert or update which I have previously covered here.

So in the first instance you need to create an Excel Connection Manager, and point it to the first file (in this case I have three files). Then drag down an Excel Source and configure it as you normally would. Next up is to perform any data conversions to ensure the data from Excel is in the correct format for my table (all pretty straight forward, e.g. converting text fields to DT_STR and setting the correct length).

Now the derived column, if you drag this down and then configure it. Basically you want to make the variable transactionDate available in the flow so it can be written to the table.

As you can see this creates a column called variableStamp which get carried through to the insert / update commands and so the values are written to the table.

So with the dataflow configured as shown you are ready to run it, go for it!

Hang on a minute, how do we process multiple file when the Excel Connection Manager is pointing to one file? Ah ha, this is where it gets exciting!!! You need to go into the properties of the Connection Manager and insert the variable filename into the connectionString.

In the first instance I would recommend making a copy of the current connection string, so you can rollback if things get confusing.

Now, select the current Excel Connection Manager and hit F4 and you should be presented with the Properties window (on the bottom right hand side most likely):

You now need to select Expressions and then the browse button that appears, this will present you with:

Select the Properties box, and a drop down menu appears. Select ConnectionString from this menu and then select the Browse button after Expression:

Ok, now we need to create the connection string to include the variable User::fileName. In my case it was:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\MSSQL\Documents\Blog Post\Files\Names-01122010.xlsx;Extended Properties="EXCEL 12.0;HDR=YES";

And I will change it to:

"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+@[User::fileName]+";Extended Properties=\"EXCEL 12.0;HDR=YES\";"

Note the additional backslashes (\), these are the escape sequences for the quotation marks required around EXCEL 12.0;HDR=YES. Remember to use Evaluate Expression to ensure all is correct.

Click Ok, then Ok again on the expressions editor, and you will notice your connection string has changed.

That is it, we are done! Go run it…. You should see Green everywhere, and the database will contain the dates for the many file you created!

Advertisements

SSIS insert and update rows in a table based on the contents of a Excel file

October 3, 2010

Lets imagine you have an Excel file of First name, Surname, and Age. Also contained in this file is a unique identifier (UID), here is an example:

ID First Surname Age
1 Bob Smith 12
2 Tom Stoner 34
3 Brenda Jones 45

Now, lets continue to imagine that you have a corresponding table in SQL Server:

So, the situation is thus; We want to update records that exist in both the table and the Excel file and insert those that exist in the Excel file but not in the table. How do we do this…. Up steps SSIS.

I am going to assume you know the basics of SSIS and so will start from the perpective of a new project in BIDS.

Under Control Flow we need to add a new Control Flow Item – Data Flow. Rename this to Insert and Update records. Now double click on this item, or select the Data Flow tab.
Now things get exciting, we want to create a data flow that queries both the Excel File and the database table, both will need to be sorted on the same column, and then we perform a left outer join. At this stage we can determine which rows of the Excel sheet are new, and which exist already. To do this we use a conditional split where we ascertain if the UID is null (from the database table), if it is we know this is a new row and so should be inserted into the table, if it isn’t we know this row exists in the table and so the row should be updated (perhaps one of the column values has changed).

Without further ado, let us begin.

1. Drag onto the canvas an Excel Source
2. Configure the Excel Source to open our Excel file (you will need to create an Excel Connection manager)
3. Drag onto the canvas an OLE DB source
4. Configure the OLE DB source to reference the person table (you will need to create an OLE DB Connection manager instance)
5. Before sorting the columns, I would advise converting the Excel columns to the same type as database columns, in this case I have selected the four columns:

a. ID converted to Four Byte Signed Integer
b. Firstname to String(100)
c. Surname to String(100)
d. Age to Four Byte Signed Integer

6. Now that is in place, both streams of data need to be sorted by the ID columns (note, make sure you pass through the columns you have just converted and don’t pass through the Excel columns.

Ok, so now the exciting bit. We will perform the left outer join and the conditional to determine if each row is either a new row or existing row.
1. Drag a Merge Join onto the canvas
2. Drag a Conditional split onto the canvas
3. Drag the line from the Excel Sort to the Merge Join, this will be your Merge Left Input
4. Drag the line from the DB Sort to the Merge Join
5. Double click on the Merge Join, select the UID column from the database, and the UID, FirstName, Surname, and Age columns from the Excel Source.

6. Remeber to change the join type to Left outer Join

Now we can perform the conditional split:
1. Drag the line from the Merge Join to the conditional split
2. Double click the conditional split and configure as shown below:

This will now give us two streams coming out of the conditional split, those belonging to the the NewRow and those belonging to the ExistingRow. The data belonging to the new row will be inserted into the table, and the data belonging to the Existing Row will update said row.

So, configuring the insertion is quite easy;
1. Drag an OLE DB Destination onto the canvas
2. Configure the OLD DB Destination to insert to the person table
3. Drag the line from the conditional split to the OLE DB Destination, you will be prompted to choose the output, you want to select NewRow (as defined in the Conditional Split)
4. Choosing the mapping tab and ensure they map to the columns in the table

That is the insert sorted, how about the update;

1. Drag an OLD DB Command onto the canvas
2. Drag the green line from the conditional split to the OLD DB Command, you will be prompted to choose the output, select ExistingRow (as defined in the conditional split)
3. To configure the OLD DB Command, Double click on the item:

a. Select the relevant connection manager (it is the DB one you created earlier)
b. Under Custom Properties select SQLCOMMAND and the button at the end of the input.
c. You are now prompted with an input box, you need to enter an SQL update command (example below)

d. You then need to configure the parameter mappings top the columns (example below)

With that complete you are done!! I will leave you to configure error checking etc, and how you want to trigger this. Your data flow should look something like this:

I hope this proves useful?

Point of note, I suffered a slight problem on my machine because it is 64Bit, I received the error message “DTS_E_OLEDB_EXCEL_NOT_SUPPORTED” – Have a read of these two articles: here and here.