Professional Documents
Culture Documents
Ssis SQL Server 2005
Ssis SQL Server 2005
Ssis SQL Server 2005
Page 1
Contents
Contents................................................................................................................ 2 Introduction........................................................................................................... 3 Design Environment .............................................................................................3 Connection Managers Pane................................................................................3 Toolbox............................................................................................................... 3 Control Flow Task............................................................................................... 3 Data Flow Task................................................................................................... 3 Event Handlers................................................................................................... 3 Package Explorer................................................................................................3 Connection Managers............................................................................................3 Database ........................................................................................................... 3 File...................................................................................................................... 4 Special ............................................................................................................... 4 Control Flow Elements...........................................................................................4 Containers.......................................................................................................... 4 Control Flow Tasks..............................................................................................5 Data Flow Components .........................................................................................5 Sources............................................................................................................... 5 Destinations........................................................................................................5 Transformations..................................................................................................6 Package Configurations.........................................................................................6 Check Point............................................................................................................ 7 Error Handling........................................................................................................7 Executing............................................................................................................... 7 Further Reading.....................................................................................................8
Introduction
SQL Server Integration Services (SSIS) is most commonly described as an extracttransform-load (ETL) tool. ETL tools are traditionally associated with preparing data for warehousing, analysis and reporting, but SSIS represents a step beyond the traditionally role. It is really a robust programming environment that happens to be good at data and database related tasks.
Design Environment
SQL Server Business Intelligence Development Studio (SSBIDS) is used to develop SSIS packages. Following are the panes you need to know to develop SSIS packages.
Toolbox
Providers a list of tasks that can be dragged onto the design surface. The list of available tasks varies depending on the selection of different tabs.
Event Handlers
Events are exposed for the overall package and each task within it. Tasks are placed here to execute for any event such as onError.
Package Explorer
Lists the entire packages element in a single tree view. This can be helpful for discovering configured elements not always obvious in other views such as event handlers and variables.
Connection Managers
A connection manager is a wrapper for the connection string and properties required to make a connection at runtime. Once the connection is defined, it can be referenced by other elements in the package without duplicating the connection definition, thus simplifying the management of this information and configuration for alternate environments
Database
Defining database connections through one of the available connection managers requires setting a few key properties such as Provider, Server, Initial Catalog and Security.
3
The first choice for accessing databases is generally an OLE DB connection manager using one of the many native providers, including SQL Server, Oracle etc.
File
Remember that every file or folder referenced needs to be available not only at design time, but after a package is deployed as well. Consider using UNC paths for file connections. The many file configuration managers are listed here. Flat file presents a text file as if it were a table, with header options. The file can be in one of three formats. Delimited: File data is separated by column and row delimiters. Fixed Width: file data has known sizes without either column or row delimiters. Ragged Right: file data is interrupted using fixed with for all columns. Excel: Indentifies a file contacting a group of cells that can be interpreted as a table.
Special
There some other non-traditional connection managers. FTP: defines a connection to a FTP Server. For most situations, entering the server name and credentials is sufficient to define the connection. This is used with the FTP task to move and remove files or create and remove directories using FTP. MSMQ: defines a connection to a Microsoft Message Queue send used in conjunction with a Message Queue task to send or receive queued messages. SMTP: specifies the name of the Simple Mail Transfer Protocol Server (SMTP) for use with the send mail task.
Containers
Containers provide important features for an SSIS package, including iteration over a group of tasks and isolation for error and event handling. The containers available are as follows Sequence: this simply contains a number of tasks without any iteration feature, but provides a shared event and error-handling context, allows shared variables to be scoped to the container level instead of the package level and enables the entire container to be disabled at once during debugging. For Loop: Provides the advantage of a sequence container. Foreach Loop: provides iteration over the contents of a container but based on various lists of items File: each file in a wildcarded directory command.
Item: Each item in a manually entered list. ADO: Each row in a manual contacting an ADO recordset or ADO.NET data set SMO: List of server objects, jobs, databases
Sources
OLEDB: The preferred method of reading database data. It requires OLE DB connection manager. Data Reader: Uses an ADO.NET connection manager to read database data. Flat File: Requires a Flat File connection Excel: Uses an excel connection manager and either worksheet or named ranges as tables. Raw: Reads a file written by a SSIS raw file destination. XML: Reads a simple XML file and presents it to the data flow as a table using either an inline schema file.
Destinations
OLE DB: Writes rows to a table, view fir which an OLE DB driver exists. SQL Server: this destination uses the same fast loading mechanism as the bulk insert task, but it restricted in that the package must execute on the SQL Server that contains the target table or view.
5
Flat File: writes the data flow to a file specified by a flat file connection manager. Excel: Sends row from the data flow to a sheet or range in a workbook using an Excel connection manager. However, worksheet can handle at most 65,536 rows of data.
Transformations
Between the source and destination, transformations provide functionality to change the data from what was read into what is needed. Each transformation requires one or more data from what was read into what is needed. Aggregate: Functions rather like a GROUP BY query in SQL, generating Min, Max, Average etc, on the input and providers one or more data flows. Conditional Split: Enables rows of a data flow to be split between different outputs depending upon the contents of the row. Configure by entering output names and expressions in the editor. Derived Column: Uses expressions to generate values that can either be added to the data flow or replace existing columns. Lookup: Finds rows in a database table that match the dataflow and includes selected columns in the data flow. For example, a productID could be added to the data flow by looking up the product name in the master table. Merge Join: Provides SQL JOIN functionality between data flows sorted on the join columns. Pivot: De-normalizes a data flow similar to the way an excel pivot table operates, making attribute values into columns. Script: Script component introduce scripting into the data flow. Slowly Changing Dimensions: Compares the data in a data flow to a dimension table and based on the roles assigned to particular columns, maintains the dimension. Sort: sorts the rows in a data flow by selected columns. Union All: makes a data flow more normalised by making columns into attribute values.
Package Configurations
Package configurations make it easier to move packages between servers and environments providing a way to set properties within the package based on environment-specific configurations. For example, the server names and input directories might change between the development and production environment. There are several types of package configurations available Registry Environment Variables XML File SQL Server Table
Check Point
Enabling checkpoint restart allows a package to restart without rerunning tasks that already completed successfully. Following are the basic rules for checkpoint restart. Only Control Flow tasks define restart points. A data flow task is viewed as a single unit of work regardless of the number of components it contains. Any transaction in progress is rolled back on failure, so the restart point must retreat to the beginning if the transaction. Thus, if the entire package executes as a single transaction, it will always restart at the beginning of the package. Any loop containers are started over from the beginning of the current loop. The configuration used on restart is saved in the checkpoint file and not the current configuration file. Enable checkpoints by setting the package properties Checkpointfilename: Name of the file to save checkpoint information in. CheckpointUsage: Set to either IfExists (starts at the beginning of the package if no file, or at the restart point if the checkpoing file exists) or Always (fails if the checkpoint file does not exist) SaveCheckPoints set to true In addition, the FailPAckageOnFailure property must be set to true for the package and every task or container that can act as a restart point.
Error Handling
You can use error handling tab of the SSIS package development environment to handle the errors.
Executing
Once package is created and tested, package can be executed in several ways. Locate the installed package in SQL Server Management Studio, right click and chose the run package, which will in turn invokes dtexec for selected package. Run dtexecui utility From SQL Agent Job step, choose the step type as SQL Server Integration Services Package.
Further Reading
Package Configurations http://204.9.76.233/articles/dba/package_configuration_2005_p1.aspx Data Cleansing http://www.sql-server-performance.com/articles/dba/data_cleaning_ssis_p1.aspx Import Text files http://204.9.76.233/articles/dba/import_text_files_ssis_p1.aspx Slowly Changing Dimension