Wednesday, August 5, 2009

ORA-29275 when Accessing a UTF8 Database with SSIS

So when I first started looking into SQL Server Integration Services (SSIS), I was told that the learning curve was steep, and that it was worth it to learn what you're doing with it. Truly said. SSIS presents a myriad of possibilities for data, and once you get your head around some of the terminology, creating simple transformations and data imports is a snap. But what about when the problems you encounter span two different products? What if one is Oracle? What if it's 64-bit, on Linux? UTF8? That's really what SSIS is for, and it's possible. But. Lots of Googling. Here's one roadblock I encountered in importing data from an Oracle database into SQL server, along with an unsatisfying workaround. But it does work, and it's easy. If you know of a better solution, I'd love to hear from you!

The Problem

So here's the problem in a nutshell: the Oracle database uses a utf8 character set. SSIS, when connecting with either the OLEDB or ADO.NET data sources, would use something else. What, precisely, I'm unable to discern. I can say that the Unicode setting was set to true in the data manager. This was manifest by a startling Oracle error during the load from the data source: ORA-29275, which is a "partial multibyte character" error. This means that the data doesn't fit the database's character set, which (one assumes) is terrible: such data is almost by definition corrupted, and getting it back reliably is a tricky proposition. Oracle says, basically, that you've got bogus data when you see this error. I was prepared to believe that, as this error occurred even in a simple select statement from the DB server itself.

The Clue

The curious thing about this situation is that, in trying to figure out what was going on, another user had logged into the server using a different OS username. When he connected to the database using the same Oracle user ID, he didn't get any errors. Aha. Environment. NLS_LANG, to be exact. Setting that to American_America.UTF8 took care of the error on the server, and on clients running SQLPLUS, to boot. All should be well, correct? No. The SSIS package continues to fail. Oh, yes: the registry. Don't forget that Oracle stores client NLS data there, as well: HKLM/Software/Oracle/$ORACLE_HOME/NLS_LANG Watch out for any dangling NLS_LANG settings in HKLM/Software/Oracle That surely will fix it, right? Sadly, it didn't, and a desperation reboot didn't help.

The Work-Around

So Google leads me to hints that ADO.NET and OLEDB from Oracle don't really pay much attention to the local NLS_LANG settings. That appears to be the case, or, at least, they don't get their settings from the same place everything else does. So, I return, sadly, to ODBC. And it works! It works well. But it's so non-portable, and, let's face it: ODBC is not anything new and shiny; it'd sure be nice to have all of our SSIS packages all new, self-contained, and .NET-ed. So if you, like me, run into this problem, know that ODBC can be your friend. If you, unlike me, know of a better solution, please let us know! I'll post updates as I encounter them.

No comments:

Post a Comment

Thanks for leaving a comment!