Chapter 4 Data Integration

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 4: Data Integration

The process of consolidating data from multiple applications and creating a unified view of
data assets is known as data integration. As companies store information in different databases,
data integration becomes an important strategy to adopt, as it helps the business users to
integrate data from different sources. For example, an e-commerce company that wants to
extract customer information from multiple data streams or databases, such as marketing, sales,
and finance. In this case, data integration would help to consolidate the data arriving from
various departmental databases, and use it for reporting and analysis.
Data integration is a core component of several different mission-critical data management
projects, such as building an enterprise data warehouse, migrating data from one or multiple
databases to another, and synchronizing data between applications. As a result, there are a
variety of data integration applications, technologies, and techniques used by businesses
to integrate data from disparate sources and create a single version of the truth. Now that you
understand what data integration process is, let’s dive into the different data integration techniques
and technologies.
Basically, Data integration is the combination of technical and business processes used to
combine data from different sources into meaningful and valuable information. Integration
Services can extract and transform data from a wide variety of sources such as XML data files,
flat files, and relational data sources, and then load the data into one or more destinations.

Figure: Data Integration Example

The Most Common Data Integration Challenges:


1. Data is Not Available Where it Should Be: One of the most common business
integration challenges is that data is not where it should be. When data is scattered
throughout the enterprise, it gets hard to bring it all together in one place. The risk
of missing a crucial part of data is always present. It could be hidden in secret files.
An ex-employee could have saved data in a different location and left without
informing the peers.

2. Data Collection Latency and Delays: In today’s world, data needs to be processed in
real-time if you want to get accurate and meaningful insights. But if the developers
manually complete the data integration steps, this is just not possible. It will lead to
a delay in data collection. By the time developers collect data from last week, there
will be this week’s left to deal with, and so on. Automated data integration tools
solve this problem effectively. These tools have been developed to collect data in
real-time without letting enterprises waste their valuable resources in the process.

3. Wrong and Multiple Formats: Another of the common challenges of system integration
is the multiple formats of data. The data saved by the finance department will be in a
format that’s different from how and sales teams present their data. Comparing and
combining unstructured data from different formats is neither effective nor useful. An
easy solution to this is to use data transformation tools. These tools analyze the formats
of data and change them to a unified format before adding data to the central database.
Some data integration and business analytics tools already have this as a built-in feature.
This reduces the number of errors you will need to manually check and solve when
collecting data.

4. Lack of Quality Data: We have an abundance of data. But how much of it is even worth
processing? Is all of it useful for the business? What if you process wrong data and
make decisions based on it? These are some challenges of integration that every
organization faces when it starts data integration. Using low-quality data can result in
long-term losses for an enterprise. How can this issue be solved? There’s something
called data quality management that lets you validate data much before it is added to
the warehouse. This saves you from moving unwanted data from its actual location to
the data warehouse. Your database will only house high-quality data that has been
validated as genuine.

5. Numerous Duplicates in Data Pipeline: Having duplicates in the data warehouse will
lead to long-term problems that will impact your business decisions. Hiring data
integration consulting services will help you eliminate data silos by creating a
comprehensive communication channel between the departments. When the employees
share data across the departments, it will naturally reduce the need to create and save
duplicate data. Standardizing validates data will also ensure that the employees know
which data to consider. Investing in technology is vital. But ensuring transparency in
the entire system is equally important.

6. Lack of Understanding of Available Data: What use is data if the employees don’t
understand it or know what to do with it? Not every employee will have the same skills.
That makes it hard for some of them to understand data. For example, the IT department
would be proficient in discussing data using technical terms. The same cannot be said
for employees from the finance or HR departments. They use different terms related to
their fields of expertise. The consulting companies that are into data integration service
offerings help create a common vocabulary that can be used throughout the enterprise.
It’s like a glossary shared with every employee to help them understand what a certain
term or phrase means. This will reduce miscommunication and mistakes caused due to
the wrong understanding of existing data.

7. Existing System Customizations: It’s most likely that your existing systems have
already been customized to suit the specific business needs. Now, bringing more tools
and software can complicate things if they are not compatible with each other. One of
the data integrations features you should invest in is the ability to provide multiple
deployment options. Whether it is on-premises or on the cloud platforms, whether it is
linking with an existing system or building a new one to suit the data-driven model,
data integration services can include ways to combine different systems and bring them
together on the same platform.

8. No Proper Planning and Approach to Data Integration: Data integration is not


something you decide and start implementing overnight. You will first need to
understand your business processes, create an environment for employees to
communicate and learn, and then start integrating data from different corners of the
enterprise. Lack of planning is one of the common data integration challenges in
healthcare as data has to come from numerous sources that include a lot of third-party
entities. Everyone involved in the process needs to know why data integration is taking
place and how they can use the analytics to improve their efficiency and productivity.
Transparency and communication will solve the problem.

9. The Number of Systems and Tools Used in the Enterprise: Most enterprises use
multiple platforms based on the type of software the employees need. The same goes
for systems and tools that are used in different departments. The marketing team relies
on software that’s not used by the HR team. With so many systems to deal with,
gathering data can be a complex task. It needs cooperation from every employee. An
easy way to collect data from multiple systems and tools is by using a pre-configured
integration software that can work with almost any business setup. You don’t need to
invest in different tools to extract data from numerous sources.

10. No Data Security: When anyone asks what all the challenges in data integration are,
you will need to include data security or the lack of it in the list. How many businesses
have been attacked by cybercriminals in recent times? Neither the industry giants nor
small startups have been spared. Data leaks, data breaches, and data corruption can
make the enterprise vulnerable to any kind of cyberattack. And it could be weeks and
months before you recognize it. Data integration services that offer end-to-end solutions
will solve data security problems. They will enhance the security systems in the
business. This ensures that only authorized employees can access the data warehouse
to add, delete, or edit the information stored.

11. Extracting Valuable Insights from Data: A complaint several businesses make is that
they are unable to extract valuable insights after data integration. How can data
integration issues be avoided when there is no proper planning? There’s an effective
solution that enterprises don’t consider before investing in data integration. What’s
that? Planning. We’ve mentioned this in our previous points. SMEs need to know what
they want to achieve before investing in any system. Unless the long-term goals are
clear, you cannot decide the right strategy to achieve the goals. You will need to choose
analytical tools that can be integrated with the data warehouse. This will ensure a
continuous cycle in organizations where data is collected, processed, analyzed, and
reports are generated to help you improve your business.

4.1 Types of Data Integration Storages


The need for data integration arises when data is coming in from various internal and external
sources. This is achieved by using one of the three different types of data integration
techniques, depending on the disparity, complexity, and volume of the data sources involved.
The most common types of data storage are:
Database: The simplest and most familiar way to store data includes both relational databases
and NoSQL data stores and may not require data transformation at all.
Data Warehouse: Adds a dimensional level to the data structure to show how data types relate
to one another and usually requires a transformation step to make data ready for use in an
analytics system.
Object Storage: Stores large amounts of unstructured data, such as sensor data, audio and video
files, photos, etc., in their native format in simple, self-contained repositories that include the
data, metadata, and a unique ID number. The metadata and ID number allow applications to
locate and access the data.
Data Lake: Collects raw and unstructured data in a single storage system, often object
storage, to be transformed and used later. Data lakes hold vast amounts of a wide variety of
data types and make processing big data and applying machine learning and AI possible.
Data Lakehouse: Serves as a single platform for data warehousing and data lake by
implementing data warehouses’ data structures and management features for data lakes.
Combining the two solutions brings storage costs down, reduces data movement and
redundancy, and saves administration time.
The reporting databases are known by different names in different enterprises: operational data
store (ODS), staging database, central repository, data warehouse (DW), or enterprise data
warehouse (EDW).
Data integration is software development. Data integration lifecycle management, or DILM, is
the art and science of managing data integration in the modern enterprise.
Data integration software is based on SQL Server Integration Services (SSIS) was released in
November 2005 which was originally slated to be dubbed Data Transformation Services
(DTS).
Based on location and data model, integration again can take some forms. They are listed
below.
4.1.2 Data Consolidation
As the name suggests, data consolidation is the process of combining data from different data
sources to create a centralized data repository or data store. This unified data store is then used
for various purposes, such as reporting and data analysis. In addition, it can also perform as a
data source for downstream applications.
One of the key factors that differentiate data consolidation from other data integration
techniques is data latency. Data latency is defined as the amount of time it takes to retrieve data
from data sources to transfer to the data store. The shorter the latency period, the fresher data
is available in the data store for business intelligence and analysis.
Generally speaking, there is usually some level of latency between the time updates occur with
the data stored in source systems and the time those updates reflect in the data warehouse or
data source. Depending on the data integration technologies used and the specific needs of the
business, this latency can be of a few seconds, hours, or more. However, with advancements
in integrated data technologies, it is possible to consolidate data and transfer changes to the
destination in near real-time or real-time.
4.1.3 Data Federation
Data federation is a data integration technique that is used to consolidate data and simplify
access for consuming users and front-end applications. In the data federation technique,
distributed data with different data models is integrated into a virtual database that features a
unified data model.
There is no physical data movement happening behind a federated virtual database. Instead,
data abstraction is done to create a uniform user interface for data access and retrieval. As a
result, whenever a user or an application queries the federated virtual database, the query is
decomposed and sent to the relevant underlying data source. In other words, the data is served
on an on-demand basis in data federation, unlike real-time data integration where data is
integrated to build a separate centralized data store.
4.1.4 Data Propagation
Data propagation is another technique for data integration in which data from an enterprise data
warehouse is transferred to different data marts after the required transformations. Since data
continues to update in the data warehouse, changes are propagated to the source data mart in a
synchronous or asynchronous manner. The two common data integration technologies used for
data propagation include enterprise application integration (EAI) and enterprise data
replication (EDR).

4.2 How Does Data Integration Add Value?


While the above data integration solutions listed above inevitably add value by saving time and
money, data integration is also useful for much larger concepts and processes. The data
management methods listed below are key examples where data integration is an essential part
of their processes. However, there are numerous applications that data integration can aid in
than just those listed here.
Business Intelligence (BI) : Business intelligence is an umbrella term describing the process of
using technology to analyze business data to help make better business decisions. Prior to using
these tools, it is essential that data is structured, cleaned, and prepared for analysis. The data
can also be used to generate informative visual reports.
Decision Making: It is vital that decision-makers have an in-depth understanding of all
necessary information to help their organization thrive. Identifying what strategies to use and
what steps to take cannot be done effectively when data is left unstructured, is siloed, or is
difficult to access.
Master Data Management (MDM): MDM by definition sounds very similar to data integration
itself, however, data integration occurs a step before the actual master data management is
done. MDM requires the input of specific policies and guidelines that the data administrators
enforce to create a “single version of the truth” for the end user.
Customer/Company Relationship: By consolidating and managing customer information in a
structured manner, you will inevitably be able to provide better customer service. Customer
data integration (CDI) can help create a more efficient data management system that allows
your representatives to easily access and query customer data as needed.
Data Virtualization: Data virtualization allows a user to access, manipulate, and query data
without needing access to the actual data storage location. To virtualize data effectively, having
a well-constructed back-end structure is key for data to be properly maintained. This will allow
for front-end applications and self-service solutions to function optimally.

4.3 Different Data Integration Technologies


Data integration technology has evolved at a rapid pace over the last decade. Initially, Extract,
Transform, Load (ETL) was the only available technology used for batch data integration
process. However, as businesses continued to add more sources to their data ecosystem and the
need for real-time data integration technologies arose, hence new advancements and
technologies were introduced:
Extract, Transform, Load (ETL)
Probably the best-known data integration technology, ETL or Extract, Transform, Load is a
data integration process that involves the extraction of data from a source system and its loading
to a target destination after transformation. ETL is the process of transferring data from the
source database to the destination data warehouse. In the process, there are 3 different sub-
processes like E for Extract, T for Transform and L for Load. The data is extracted from the
source database in the extraction process which is then transformed into the required format
and then loaded to the destination data warehouse. For performing all these functions there are
certain tools that are called the ETL tools.
ETL is used for data consolidation primarily and can be conducted in batches or in a near-real-
time manner using change data capture (CDC). Batch ETL is mostly used for bulk movements
of data, such as during data migration. On the other hand, CDC is a more suitable choice to
transfer changes or updated data to the target destination.
During the ETL process, data is extracted from a database, ERP solution, cloud application, or
file system and transferred to another database or data repository. The transformations
performed on the data vary depending on the specific data management use case. However,
common transformations performed include data cleansing, data quality, data aggregation, and
data reconciliation.
Enterprise Information Integration (EII)
Enterprise Information Integration (EII) is a data integration technology used to deliver curated
datasets on an on-demand basis. Also considered a type of data federation technology, EII
involves the creation of a virtual layer or a business view of underlying data sources. This layer
shields the consuming applications and business users from the complexities of connecting to
disparate source systems having different formats, interfaces, and semantics. In other words,
EII is a technology that allows developers and business users alike to treat a range of data
sources as if they were one database and present the incoming data in new ways.
Unlike batch ETL, EII can handle real-time data integration and delivery use-cases very easily,
allowing business users to consume fresh data for data analysis and reporting.
Enterprise Data Replication (EDR)
Used as a data propagation technique, Enterprise Data Replication (EDR) is a real-time data
consolidation method that involves moving data from one storage system to another. In its
simplest form, EDR involves moving a dataset from one database to another database having
the same schema. However, recently, the process has become more complex to involve
disparate source and target databases, with data being replicated at regular intervals, in real-
time, or sporadically, depending on the needs of the enterprise.
While both EDR and ETL involve bulk movement of data, EDR is different because it does
not involve any kind of data transformation or manipulation.
In addition to these three key data integration technologies, enterprises with complex data
management architectures also make use of Enterprise Application Integration (EAI), Change
Data Capture (CDC), and other event-based and real-time technologies to keep up with the data
needs of their business users.
4.3.1 Extract, Transform, Load (ETL)
There are three steps that make up the ETL process and enable data to be integrated from source
to destination. Those steps are data extraction, data transformation and data loading.
Figure : ETL Process
Step 1: Extraction:
❑ The Extract step covers the data extraction from the source systems and makes it
accessible for further processing.
❑ The main objective of the extract step is to retrieve all the required data from the source
systems with as little resources as possible.
❑ Before data can be moved to a new destination, it must first be extracted from its source.
In this first step of the ETL process, structured and unstructured data is imported and
consolidated into a single repository. Raw data can be extracted from a wide range of
sources, including:
❖ Existing databases and legacy systems
❖ Cloud, hybrid, and on-premises environments
❖ Sales and marketing applications
❖ Mobile devices and apps
❖ CRM systems, Data storage platforms, Data warehouses
Step 2: Transformation: Staging State
❑ During this phase of the ETL process, rules and regulations can be applied that ensure
data quality and accessibility.
❑ The process of data transformation is comprised of several sub-processes:
a) Cleansing — inconsistencies and missing values in the data are resolved.
b) Standardization — formatting rule are applied to the data set.
c) De-duplication — redundant data is excluded or discarded.
d) Verification — unusable data is removed and anomolies are flagged.
e) Sorting — data is organized according to type.
❑ Transformation is generally considered to be the most important part of the ETL
process. Data transformation improves data integrity and helps ensure that data arrives
at its new destination.
Step 2: Loading
❑ The final step in the ETL process is to load the newly transformed data into a new
destination. Data can be loaded all at once (full load) or at scheduled intervals
(incremental load).
❖ Full loading — In an ETL full loading scenario, everything that comes from the
transformation assembly line goes into new, unique records in the data
warehouse.
❖ Incremental loading — A less comprehensive but more manageable approach
is incremental loading.
❑ Incremental loading compares incoming data with what’s already on
hand, and only produces additional records if new and unique
information is found.
This architecture allows smaller, less expensive data warehouses to maintain and
manage business intelligence.
Benefits of ETL:
1) Automated processes will save you time: The beauty of ETL lies in its ability to collect,
transform and assemble data in an automated manner.
2) Complex data is no longer a challenge: The data that work with is sure to be complex and
rather varied. To start with, it probably includes different timestamps, currencies and campaign
names and location coordinates, customers’ names, devices IDs, sellers.
3) Human error is not an issue anymore: As careful as marketers may be with their data, they
are not really immune to making mistakes.
4) It is the path to better decision-making: By automating crucial data processes and
minimizing the chance of error and high-quality data is at the heart of making strong business
decisions.
4.3.2 EII (Enterprise Information Integration)
Enterprises have been long storing their data in a myriad of disparate systems like relational
databases, mainframes, different operating systems, free text, hierarchical repositories, etc.
The need for organizations to identify and correlate related, but separate data was something
long thought of. Using new and existing data assets in an efficient, integrated, and
interchangeable manner has become the key to surviving, and thriving, in the new economy
and EII provides a strategic advantage to this goal.
The EII (Enterprise Information Integration) is the software ability that sees the data and the
information of the entire organization as a unified view so that it can be managed as a single
source. EII tools use the concept of virtual databases to provide access to multiple databases.
It provides a means for real-time data integration and allows access to this data through a
single data layer.

Characteristics
• Supports variety of data sources
• SQL based API
• Real-time programming model
• location transparency
• Automatic data type conversion services
• Ability to join, union, aggregate, and otherwise correlate data from multiple
sources in a single query
• Ability to create individual views based on data integrated from multiple sources
4.4 Critical ETL components:
Regardless of the exact ETL process you choose, there are some critical components you’ll
want to consider:
• Support for change data capture (CDC): Incremental loading allows you to update
your analytics warehouse with new data without doing a full reload of the entire data
set.
• Auditing and logging: You need detailed logging within the ETL pipeline to ensure
that data can be audited after it’s loaded and that errors can be debugged.
• Handling of multiple source formats: To pull in data from diverse sources such as
Salesforce’s API, your back-end financials application, and databases such as MySQL
and MongoDB, your process needs to be able to handle a variety of data formats.
• Fault tolerance: In any system, problems inevitably occur. ETL systems need to be
able to recover gracefully, making sure that data can make it from one end of the
pipeline to the other even when the first run encounters problems.
• Notification support: If you want your organization to trust its analyses, you have to
build in notification systems to alert you when data isn’t accurate. These might include:
• Proactive notification directly to end users when API credentials expire
• Passing along an error from a third-party API with a description that can help
developers debug and fix an issue
• If there’s an unexpected error in a connector, automatically creating a ticket to
have an engineer look into it
• Utilizing systems-level monitoring for things like errors in networking or
databases
• Low latency: Some decisions need to be made in real time, so data freshness is critical.
While there will be latency constraints imposed by particular source data integrations,
data should flow through your ETL process with as little latency as possible.
• Scalability: As your company grows, so will your data volume. All components of an
ETL process should scale to support arbitrarily large throughput.
• Accuracy: Data cannot be dropped or changed in a way that corrupts its meaning.
Every data point should be auditable at every stage in your process.
4.5 Data Warehousing:
Data warehousing is a process used to collect and manage data from multiple sources
into a centralized repository to drive actionable business insights. With all your data in
one place, it becomes simpler to perform analysis and reporting at different aggregate
levels. It is the core of the BI system and helps you make better business decisions. In
simple words, it is the electronic storage space for all your business data integrated from
different marketing and other sources.
Types of Data Warehousing:
❑ Enterprise Data Warehouse: Enterprise data warehouse is a centralized warehouse
that offers decision-making support to different departments across an enterprise. It
provides a unified approach for organizing as well as classify the data as per the subject.
❑ Operational Data Store: Popularly known as ODS, Operational Data Store is used
when an organization’s reporting. It can be refreshed in real-time, making it best for
routine activities like storing employees’ records.
❑ Data Mart: Data Mart is particularly designed for a specific business line like finance,
accounts, sales, purchases, or inventory. The warehouse allows you to collect data
directly from the sources.
Data Warehouse Appliances
❑ Data Warehouse Appliances are a set of hardware and software tools used for storing
data.
❑ Every data-driven business uses these appliances to build a centralized and
comprehensive data warehouse, where all kinds of functional business data can be
stored.
Figure: DW Appliances
❑ data warehousing is a process of combining data from multiple sources and organizing
it in a way that supports organizational tactical and strategic decision making.
❑ The main purpose of a data warehouse is to provide a transparent picture of the business
at a given point in time.
Business Intelligence (BI) : can be described as a set of tools and methods that facilitate
the transformation of raw data into meaningful patterns to drive useful insights to make
better business decisions.
❑ The process of BI involves data preparation, analytics, and visualization.
❑ BI(Business Intelligence) is a set of processes, architectures, and technologies that
convert raw data into meaningful information that drives profitable business actions.
❑ It is a suite of software and services to transform data into actionable intelligence and
knowledge.
❑ Business Intelligence is an umbrella term used with data analytics. It is a process that
performs data preparation, analytics, and visualization.
❑ Whereas data warehousing describes tools that combine data from disparate sources,
clean the data, and prepare it for analysis.
❑ BI has a direct impact on organization's strategic, tactical and operational business
decisions. BI supports fact-based decision making using historical data rather than
assumptions.
❑ BI tools perform data analysis and create reports, summaries, dashboards, maps,
graphs, and charts to provide users with detailed intelligence about the nature of the
business.
Figure: Business Intelligence
Business intelligence software and systems:
❑ A variety of different types of tools fall under the business intelligence umbrella. The
software selection service breaks down some of the most important categories and
features:
❖ Dashboards
❖ Visualizations
❖ Reporting
❖ Data mining
❖ ETL (extract-transfer-load —tools that import data from one data store into
another)
❖ OLAP (online analytical processing)

4.6 Data Integration Methods and Strategies:

❑ Manual data integration


❑ Middleware data integration
❑ Application-based integration

1) Manual Data Integration Approach: Manual data integration describes the process of a
person manually collecting the necessary data from different sources by accessing them
directly. The data is cleaned as needed, and stored in a single warehouse. This method of
data integration is extremely inefficient and makes sense only for small organizations with
an absolute minimum of data resources. There is no unified view of the data.
It occurs when a data manager oversees all aspects of the integration — usually by writing
custom code. That means connecting the different data sources, collecting the data, and
cleaning it, etc., without automation.

In this approach, a web-based user interface or an application is created for users of the
system to show all the relevant information by accessing all the source systems directly.
There is no unification of data in reality.

2) Middleware data integration: Middleware is a software that links two separate


applications or is commonly used to illustrate different products that function as a glue
between two separate applications. For instance, there are various middleware products that
establish a connection between a Web server and a database system. Middleware data
integration acts as a mediator, and helps to normalize data before bringing it to the master
data pool. Often, older legacy applications don’t work well with new applications.
Middleware offers a solution when data integration systems cannot access data coming
from one of these legacy applications.
Integration middleware is classified based on domains, which are defined by the types of
resources that are incorporated:
1) Cloud Integration: Integrates with and also between the cloud services, cloud-based
applications (SaaS), private clouds, trade hubs and other typical cloud resources through
Web services and standard B2B communication strategies (FTP, AS2, etc.)

2) B2B Integration: Integrates customer, provider and various alternative partner


interfaces with various data resources and company-managed applications

3) Application Integration (A2A): Integrates various company-managed applications


together, including cloud-based and remote systems

1) Cloud Integration:

❑ Cloud integration is a system of tools and technologies that connects various


applications, systems, repositories, and IT environments for the real-time exchange of
data and processes.
❑ Once combined, the data and integrated cloud services can then be accessed by
multiple devices over a network or via the internet.
Figure: Cloud Integration

Benefits of cloud Integration:

❑ Companies who use cloud integration have synchronized data and applications,
improving their ability to operate effectively and nimbly.
❑ Other benefits include:
❖ Improved operational efficiency
❖ Increased flexibility and scalability
❖ Faster time-to-market
❖ Better internal communication
❖ Improved customer service, support, and retention
❖ Increased competitive edge
❖ Reduced operational costs and increased revenue
4.7 SSIS Is a Software Development Platform

❑ SQL Server Integration Services is a platform for building enterprise-level data


integration and data transformations solutions.
❑ SSIS is an ETL tool (Extract, Transform, Load) which is very much need for data
warehousing applications. SSIS is used to perform the operations like loading data
based on the need, performing different transformations on the data like doing
calculation(Sum, average, etc.,) to define workflow of the process flow and perform
day to day activities.
❑ Use Integration Services to solve complex business problems by copying or
downloading files, loading data warehouses, cleansing and mining data, and managing
SQL Server objects and data.

Why we use SSIS?

❑ SSIS tool helps you to merge data from various data stores
❑ Automates Administrative Functions and Data Loading
❑ Populates Data Marts & Data Warehouses
❑ Helps to clean and standardize data and Building BI into a Data Transformation
Process.
❑ Automating Administrative Functions and Data Loading
❑ SIS contains a GUI that helps users to transform data easily rather than writing large
programs and It can load millions of rows from one data source to another in very few
minutes
❑ Identifying, capturing, and processing data changes
❑ Coordinating data maintenance, processing, or analysis
❑ SSIS eliminates the need of hardcore programmers
❑ SSIS offers robust error and event handling.

SSIS Salient Features

❑ Studio Environments
❑ Relevant data integration functions
❑ Effective implementation speed
❑ Tight integration with other Microsoft SQL family
❑ Data Mining Query Transformation
❑ Fuzzy Lookup and Grouping Transformations
❑ Term Extraction and Term Lookup Transformations
❑ Higher speed data connectivity components such as connectivity to SAP or Oracle

Components of SSIS Architecture

❑ Control Flow (Stores containers and Tasks)


❑ Data Flow (Source, Destination, Transformations)
❑ Event Handler (sending of messages, Emails)
❑ Package Explorer (Offers a single view for all in package)
❑ Parameters (User Interaction)

1. Control Flow

Control flow is a brain of SSIS package. It helps you to arranges the order of execution for
all its components. The components contain containers and tasks which are managed by
precedence constraints.
2. Precedence Constraints
Precedence constrain are package component which direct tasks to execute in a predefined
order. It also defines the workflow of the entire SSIS package. It controls the execution of
the two linked tasks by executing the destination tasks based on the result of the earlier
task.
3. Data Flow
The main use of the SSIS tool is to extract data into the server's memory, transform it, and
write it to another destination. If Control Flow is the brain, Data Flow is the heart of SSIS.
4.Containers
The container is units for grouping tasks together into units of work. Apart from offering
visual consistency, it also allows to declare variables and event handlers which should be
in the scope of that specific container.
Two types of containers in SSIS are: 1) Sequence Container and 2) Loop Container
1) Sequence Container: allows to organize subsidiary tasks by grouping them.
2) For loop container : which Provides the same functionality as the sequence Container
except that it runs the tasks multiple times.
SSIS Packages:
Another core component of SSIS is the notion of a package. It is a collection of tasks which
execute in an orderly fashion. Here, president constraints help manage the order in which
the task will execute. A package can help you to saves files onto a SQL Server, in the
package catalog database.

Figure: SSIS Package


Advantages and Disadvantages of using SSIS:

❑ Broad documentation and support


❑ Ease and speed of implementation
❑ Tight integration with SQL Server and visual studio
❑ Standardized data integration
❑ Offers real-time, message-based capabilities
❑ Support for distribution model
❑ Helps you to remove network as a bottleneck for insertion of data by SSIS into SQL
❑ SISS allows you to use the SQL Server Destination.

Disadvantages of SSIS

❑ Sometimes create issues in non-windows environments


❑ Unclear vision and strategy
❑ SSIS lacks support for alternative data integration styles
❑ Problematic integration with other products

You might also like