Friday, September 4, 2009

An Introduction to SSIS - A Beginning Step-by-Step Tutorial

SQL Server Integration Services (SSIS) is a *really* powerful data transformation and import tool; it allows for all kinds of data manipulation, both between databases and within them.
The problem is that it’s not entirely intuitive; the learning curve is steep. But you really shouldn’t let that stop you from trying it out: once you’ve got the basics, it’s really quite accessible.
In this series of posts, we’ll do some basic, step-by-step data manipulation with SSIS, starting with importing data from a CSV file into a SQL Server 2008 database. We’ll move on to copying data between Oracle and SQL Server.
For more complicated data, bulk insert allows for some reasonably full choices. You can read about using it in my post here.

Importing data from a CSV file into SQL Server

SQL Server allows for a few ways to import data from a text file, most directly with the import wizard, which is a pretty accessible way to import uncomplicated data. This, by the way, uses SSIS in doing the import, though it wraps it all in a wizard.
SSIS, though, offers the most flexibility, both in terms of data source and destination, as well as what you’d like to do with the data as you’re copying it.

How do I get SSIS?

SSIS is included with SQL Server (non-express editions). It’s installed by default, and you can specify its installation specifically by selecting the “Integration Services” components of SQL Server when you’re doing the installation.
Note, by the way, that SSIS does not require a SQL Server instance in order to run.



How do I use SSIS?

Ah, here is where the rubber hits the road. The short answer is: you develop packages for SSIS to use with the SQL Server Business Intelligence Development Studio (whew!). We’ll call it BIDS from now on. (Update:  newer versions use a Visual Studio shell called "SQL Server Data Tools".) Packages that you create are run either through the development studio or on a server with Integration Services installed; we’ll begin with a simple and uncomplicated import from a standard, unformly-formatted CSV file:

Importing Uniform Data using SSIS

Let’s begin with a CSV file that looks like this:
Last Name,First Name,Phone,Age
Lastname1,Firstname1,555-555-1111,25
Lastname2,Firstname2,555-555-1112,30
Lastname3,Firstname3,555-555-1113,55
Lastname4,Firstname4,555-555-1114,84
Lastname5,Firstname5,555-555-1115,22
Lastname6,Firstname6,555-555-1116,44
Lastname7,Firstname7,555-555-1117,66
Lastname8,Firstname8,555-555-1118,31
Lastname9,Firstname9,555-555-1119,30
Lastname10,Firstname10,555-555-1120,21
Let’s save this to a file named test.csv.

Launch BIDS

Click on the start button and browse to All Programs –> Microsoft SQL Server 2008 –> SQL Server Business Intelligence Development Studio.
I hope I don’t have to type that whole name again.
image
When it launches, you’ll see a screen that looks something like the image to the right (click for a larger version).
In the top-left pane, you’ll see a few options (below). Click on the Create Project link to start a new project.
image
A new window will open, asking what kind of project you’d like to create. Select “Integration Services Project” and give it a name. You can choose a shorter path to which to save your project, as well. Click on OK to begin designing your project.
image

Create A Source Connection

When you click on OK, you’ll see a window like the one below. What we want to do is create a new connection, which is just another way of saying that we want to set up a data source. In our case, we’re going to use a CSV file, so right-click on the Connection Manager pane at the bottom and select “New Flat File Connection.”
image
That will bring up the following page. I’ve gone ahead and filled in the details.
image
Basically, we’re telling SSIS that we want to set up a data source using this file (test.csv). You can see that it’s (by default) set to parse the file as a “delimited” text file, which is what a CSV file is. Note that the first row in our file contains the names of the columns; as a result, we’ve checked “Column names in the first data row.” Leave that blank if your data doesn’t have column names in the first row.
Likewise, SSIS allows you to skip n rows before looking at the data. This is useful, especially, when using automated processes that insert a lot of commands or comments at the front of the file.
One other piece of this screen is very useful, such that it’s worth digressing just a bit: the “Text qualifier” field allows you to define a character that will set a field as a text string. This usually is a double quote. Lots of times you’ll run into a CSV that looks something like this:
FirstName,LastName,Birthdate
Brad,Benson,3/3/1970 Sam,"Watson, Jr.",7/20/1971
(This is an example from a comment on another post on bulk insert here.) You can see the problem: there’s a comma in the person’s first name, and it’s set apart by quotes. Bulk insert will parse that first name as two separate fields. Using this Text qualifier field, SSIS will render the above CSV correctly, thus:
FirstName LastName Birthdate
Brad Benson 3/3/1970
Sam Watson, Jr. 7/20/1971

OK, back to our task.

You can take a look at the settings in this window; for our purposes, the defaults should work nicely. You can preview the data by clicking on the preview link on the left-hand side; this should show you the contents of our CSV file, broken down into a table.

Create a Destination Connection

Now that we’ve got the source defined, we need to do something similar for the destination. Right-click in the connection manager window again and select “New OLE DB Connection”. Click on the “New” button to create a new connection.
When you do, you’ll see a window like that below:
image
Fill in the server name and database name appropriately and click on OK. I’ve selected the adventureworks database. You’ll notice that there’s not a place here to name the connection; that’s OK, because it’ll be named SERVERNAME\.DBName.

Create a Data Flow Task

SSIS uses workflows to step through its process, and most of your work will be done in the data flow task. Note that the data flow task will contain other sub-tasks.
While in the Control Flow tab, mouse over the Toolbox in the top-left corner (highlighted below).
image
When you do, the Toolbox will open, as below:
image
Click on the “Data Flow Task” item, and drag it to the main screen. You’ll see an icon like that below. Double-click on that icon to open it.
image
When you double-click on the task, it appears that you have just erased it, but in reality, you’ve changed to the Data Flow tab; look at the top of the screen, and you’ll see that you’re no longer on the control flow tab. You can go back and forth between these as you choose.

Add the Source task

Now that we’re editing our data flow task, click on the toolbox again to drag the Flat File Source to the main screen. By default, it’ll be populated with the CSV connection we set up earlier.
image
One thing to note is the “Retain null values from the source as null values in the data flow.” This is disabled by default, which means that null values in the source will be treated as empty strings, instead.
You can click on the Preview button to double-check your file.

Create the Data Destination

Now that we’re reading the data, we need to insert it into the database. Click on the Toolbox again and scroll down to the Data Flow Destinations section. Drag the OLE DB Destination to the main window.
Before we can write any data, we have to tell SSIS how to map the data. In our example, we’re going to create a new table in the Adventureworks database, but the process for using an existing table is the same.
imageFirst, click on the Flat File source icon and drag the green arrow to the OLE DB Destination icon.
Next, double-click on the Destination icon to edit it: Notice that this also allows you to keep nulls. In our example, we’re going to create a new table, but if you have a table already created, you can select it from the table drop-down menu.
image
Clicking on the New… button will bring the the following screen:
image Change the table name in the SQL statement (I chose “importTest”). Of course, we’d also normally want to change the data types; the age field would be better as an integer field, and we wouldn’t normally want the phone number to be a 50-character field. We’ll leave it with defaults for our demonstration, however. Click on OK.
Finally, we need to set the mappings: click on the Mappings option on the left-hand side of the screen:
image
The defaults should be fine for our purposes, so you can click on OK.

Test it out

We should be good to go. Hit <F5> to run your package. You should see each step turn green as the data is read and written.
image
We’ll spend some time in another post on scheduling SSIS packages; there are a lot of things that can complicate that process.

28 comments:

  1. Excellent Post! That was easy to follow and my import worked on the first attempt. Thank you.

    ReplyDelete
  2. Good post. I am new to SSIS, it was a good start for me.

    ReplyDelete
  3. Nice article for beginers.

    ReplyDelete
  4. Thanks for the tutorial. Very nice article for beginners of SSIS

    ReplyDelete
  5. I'm pleased it was useful!

    ReplyDelete
  6. I am new to SSIS and this post was the best. Excellent post.

    ReplyDelete
  7. You gave me a good path beginning my SSIS practice

    ReplyDelete
  8. Hi..am impressed....It was very useful to start my SSIS practice

    ReplyDelete
  9. Just what the doctors recommended for anyone new to SSIS.
    The combination of notes and screenshots simply awesome !!!
    Thanks a ton

    ReplyDelete
  10. I'm really pleased it's useful; thanks for the feedback!

    ReplyDelete
  11. thaks Lane pls keep up this kind of posts in future

    ReplyDelete
  12. Its very helpful for the beginers...thanks Lane

    ReplyDelete
  13. HI yes it is too nice... Thankuu Lane

    ReplyDelete
  14. Thanks Lane...it is really helpful.

    ReplyDelete
  15. I'd like to add my thanks to the above. The tutorial is much clearer than many of the others that one finds with Google.

    ReplyDelete
  16. Yes, I agree. This is an excellent tutorial and it worked for me on the first attempt. Thank you.

    ReplyDelete
  17. Thank you so much. This is something I was looking for.Easy to understand.Very helpful.


    Cheers!

    ReplyDelete
  18. Great... Thank you very much. :-)

    ReplyDelete
  19. Thank you very much......!

    ReplyDelete
  20. very helpful for beginners! good job! :)

    ReplyDelete
  21. Very Excellent for beginners.. Great.. Thank you sir...

    ReplyDelete
  22. Really helpful for beginners, exactly what I was looking for. Thank you!

    ReplyDelete
  23. thank u for giving this best information..we are offering msbi online training

    ReplyDelete
  24. Thank you for your contribution, Štefan.

    ReplyDelete
  25. Nice post. very useful. Thank you a bunch

    ReplyDelete

Thanks for leaving a comment!