Friday, September 4, 2009

An Introduction to SSIS - A Beginning Step-by-Step Tutorial

SQL Server Integration Services (SSIS) is a *really* powerful data transformation and import tool; it allows for all kinds of data manipulation, both between databases and within them.
The problem is that it’s not entirely intuitive; the learning curve is steep. But you really shouldn’t let that stop you from trying it out: once you’ve got the basics, it’s really quite accessible.
In this series of posts, we’ll do some basic, step-by-step data manipulation with SSIS, starting with importing data from a CSV file into a SQL Server 2008 database. We’ll move on to copying data between Oracle and SQL Server.
For more complicated data, bulk insert allows for some reasonably full choices. You can read about using it in my post here.

Importing data from a CSV file into SQL Server

SQL Server allows for a few ways to import data from a text file, most directly with the import wizard, which is a pretty accessible way to import uncomplicated data. This, by the way, uses SSIS in doing the import, though it wraps it all in a wizard.
SSIS, though, offers the most flexibility, both in terms of data source and destination, as well as what you’d like to do with the data as you’re copying it.

How do I get SSIS?

SSIS is included with SQL Server (non-express editions). It’s installed by default, and you can specify its installation specifically by selecting the “Integration Services” components of SQL Server when you’re doing the installation.
Note, by the way, that SSIS does not require a SQL Server instance in order to run.



How do I use SSIS?

Ah, here is where the rubber hits the road. The short answer is: you develop packages for SSIS to use with the SQL Server Business Intelligence Development Studio (whew!). We’ll call it BIDS from now on. (Update:  newer versions use a Visual Studio shell called "SQL Server Data Tools".) Packages that you create are run either through the development studio or on a server with Integration Services installed; we’ll begin with a simple and uncomplicated import from a standard, unformly-formatted CSV file:

Importing Uniform Data using SSIS

Let’s begin with a CSV file that looks like this:
Last Name,First Name,Phone,Age
Lastname1,Firstname1,555-555-1111,25
Lastname2,Firstname2,555-555-1112,30
Lastname3,Firstname3,555-555-1113,55
Lastname4,Firstname4,555-555-1114,84
Lastname5,Firstname5,555-555-1115,22
Lastname6,Firstname6,555-555-1116,44
Lastname7,Firstname7,555-555-1117,66
Lastname8,Firstname8,555-555-1118,31
Lastname9,Firstname9,555-555-1119,30
Lastname10,Firstname10,555-555-1120,21
Let’s save this to a file named test.csv.

Launch BIDS

Click on the start button and browse to All Programs –> Microsoft SQL Server 2008 –> SQL Server Business Intelligence Development Studio.
I hope I don’t have to type that whole name again.
image
When it launches, you’ll see a screen that looks something like the image to the right (click for a larger version).
In the top-left pane, you’ll see a few options (below). Click on the Create Project link to start a new project.
image
A new window will open, asking what kind of project you’d like to create. Select “Integration Services Project” and give it a name. You can choose a shorter path to which to save your project, as well. Click on OK to begin designing your project.
image

Create A Source Connection

When you click on OK, you’ll see a window like the one below. What we want to do is create a new connection, which is just another way of saying that we want to set up a data source. In our case, we’re going to use a CSV file, so right-click on the Connection Manager pane at the bottom and select “New Flat File Connection.”
image
That will bring up the following page. I’ve gone ahead and filled in the details.
image
Basically, we’re telling SSIS that we want to set up a data source using this file (test.csv). You can see that it’s (by default) set to parse the file as a “delimited” text file, which is what a CSV file is. Note that the first row in our file contains the names of the columns; as a result, we’ve checked “Column names in the first data row.” Leave that blank if your data doesn’t have column names in the first row.
Likewise, SSIS allows you to skip n rows before looking at the data. This is useful, especially, when using automated processes that insert a lot of commands or comments at the front of the file.
One other piece of this screen is very useful, such that it’s worth digressing just a bit: the “Text qualifier” field allows you to define a character that will set a field as a text string. This usually is a double quote. Lots of times you’ll run into a CSV that looks something like this:
FirstName,LastName,Birthdate
Brad,Benson,3/3/1970 Sam,"Watson, Jr.",7/20/1971
(This is an example from a comment on another post on bulk insert here.) You can see the problem: there’s a comma in the person’s first name, and it’s set apart by quotes. Bulk insert will parse that first name as two separate fields. Using this Text qualifier field, SSIS will render the above CSV correctly, thus:
FirstName LastName Birthdate
Brad Benson 3/3/1970
Sam Watson, Jr. 7/20/1971

OK, back to our task.

You can take a look at the settings in this window; for our purposes, the defaults should work nicely. You can preview the data by clicking on the preview link on the left-hand side; this should show you the contents of our CSV file, broken down into a table.

Create a Destination Connection

Now that we’ve got the source defined, we need to do something similar for the destination. Right-click in the connection manager window again and select “New OLE DB Connection”. Click on the “New” button to create a new connection.
When you do, you’ll see a window like that below:
image
Fill in the server name and database name appropriately and click on OK. I’ve selected the adventureworks database. You’ll notice that there’s not a place here to name the connection; that’s OK, because it’ll be named SERVERNAME\.DBName.

Create a Data Flow Task

SSIS uses workflows to step through its process, and most of your work will be done in the data flow task. Note that the data flow task will contain other sub-tasks.
While in the Control Flow tab, mouse over the Toolbox in the top-left corner (highlighted below).
image
When you do, the Toolbox will open, as below:
image
Click on the “Data Flow Task” item, and drag it to the main screen. You’ll see an icon like that below. Double-click on that icon to open it.
image
When you double-click on the task, it appears that you have just erased it, but in reality, you’ve changed to the Data Flow tab; look at the top of the screen, and you’ll see that you’re no longer on the control flow tab. You can go back and forth between these as you choose.

Add the Source task

Now that we’re editing our data flow task, click on the toolbox again to drag the Flat File Source to the main screen. By default, it’ll be populated with the CSV connection we set up earlier.
image
One thing to note is the “Retain null values from the source as null values in the data flow.” This is disabled by default, which means that null values in the source will be treated as empty strings, instead.
You can click on the Preview button to double-check your file.

Create the Data Destination

Now that we’re reading the data, we need to insert it into the database. Click on the Toolbox again and scroll down to the Data Flow Destinations section. Drag the OLE DB Destination to the main window.
Before we can write any data, we have to tell SSIS how to map the data. In our example, we’re going to create a new table in the Adventureworks database, but the process for using an existing table is the same.
imageFirst, click on the Flat File source icon and drag the green arrow to the OLE DB Destination icon.
Next, double-click on the Destination icon to edit it: Notice that this also allows you to keep nulls. In our example, we’re going to create a new table, but if you have a table already created, you can select it from the table drop-down menu.
image
Clicking on the New… button will bring the the following screen:
image Change the table name in the SQL statement (I chose “importTest”). Of course, we’d also normally want to change the data types; the age field would be better as an integer field, and we wouldn’t normally want the phone number to be a 50-character field. We’ll leave it with defaults for our demonstration, however. Click on OK.
Finally, we need to set the mappings: click on the Mappings option on the left-hand side of the screen:
image
The defaults should be fine for our purposes, so you can click on OK.

Test it out

We should be good to go. Hit <F5> to run your package. You should see each step turn green as the data is read and written.
image
We’ll spend some time in another post on scheduling SSIS packages; there are a lot of things that can complicate that process.

34 comments:

  1. Excellent Post! That was easy to follow and my import worked on the first attempt. Thank you.

    ReplyDelete
  2. Good post. I am new to SSIS, it was a good start for me.

    ReplyDelete
  3. Nice article for beginers.

    ReplyDelete
  4. Thanks for the tutorial. Very nice article for beginners of SSIS

    ReplyDelete
  5. I am new to SSIS and this post was the best. Excellent post.

    ReplyDelete
  6. You gave me a good path beginning my SSIS practice

    ReplyDelete
  7. Hi..am impressed....It was very useful to start my SSIS practice

    ReplyDelete
  8. Just what the doctors recommended for anyone new to SSIS.
    The combination of notes and screenshots simply awesome !!!
    Thanks a ton

    ReplyDelete
  9. I'm really pleased it's useful; thanks for the feedback!

    ReplyDelete
  10. thaks Lane pls keep up this kind of posts in future

    ReplyDelete
  11. Its very helpful for the beginers...thanks Lane

    ReplyDelete
  12. HI yes it is too nice... Thankuu Lane

    ReplyDelete
  13. Thanks Lane...it is really helpful.

    ReplyDelete
  14. I'd like to add my thanks to the above. The tutorial is much clearer than many of the others that one finds with Google.

    ReplyDelete
  15. Yes, I agree. This is an excellent tutorial and it worked for me on the first attempt. Thank you.

    ReplyDelete
  16. Thank you so much. This is something I was looking for.Easy to understand.Very helpful.


    Cheers!

    ReplyDelete
  17. Great... Thank you very much. :-)

    ReplyDelete
  18. Thank you very much......!

    ReplyDelete
  19. very helpful for beginners! good job! :)

    ReplyDelete
  20. Very Excellent for beginners.. Great.. Thank you sir...

    ReplyDelete
  21. Really helpful for beginners, exactly what I was looking for. Thank you!

    ReplyDelete
  22. thank u for giving this best information..we are offering msbi online training

    ReplyDelete
  23. Thank you for your contribution, Štefan.

    ReplyDelete
  24. Nice post. very useful. Thank you a bunch

    ReplyDelete
  25. Any animal can take a gander at the WiFi Password Hacker Online segment in the graph and might want to utilize it free of charge. Online Wifi Password Hack

    ReplyDelete
  26. Mega Downloader APK gives client controlled scrambled distributed storage space and talk from one side to another ordinary internet browsers, mutually with dedicated applications for portable methodology. Mega Download Link

    ReplyDelete
  27. I just want you to know that if you are out there and you are being really hard on yourself right now for something that has happened ... it's normal. That is what is going to happen to you in life. . Beautiful Life Quotes

    ReplyDelete
  28. We just finished our keygen for the new Autodesk Infraworks 2021, another work made by our amazing guys. Its not tested in every sistem, so we'll be very XForce Keymaker

    ReplyDelete
  29. Don't worry, you can still install and activate Windows Vista legally in 2022. There are many ways through which you can activate your Windows .Windows Vista Ultimate Product Key 2022

    ReplyDelete

Thanks for leaving a comment!