Data Integration in Practice: IPaaS Implementation
In our previous post, we’ve discussed data integration, its challenges, and how Integration Platforms (IPaaS) help overcome those.
Today let’s dive into the technical details of IPaaS implementation.
Gathering and understanding requirements is key to properly designing integration and identifying technical assumptions. Overall, we can distinguish the types of integration and ways of modeling them, based on their:
Endpoints - integration between two applications or data export/import for a single application (e.g. exporting data to a file for reporting or creating a backup).
Integration direction - integration from system A to system B or vice versa. In the second case, it gets more complicated, involving changes tracking, proper versioning (defining which system is up-to-date and should be the source in the integration process), and efficient problem solving when concurrent changes are made on both systems.
Synchronization method - a periodic launch of an integration with processing of a certain data set (batch) or a live synchronization update with near-real-time (NRT) data processing. The former is usually started manually (on demand) or according to the schedule. If the source data and destination systems are out of sync, we can run this integration manually. Whereas NRT integration starts immediately when a change occurs and is often based on events assigned to specific operations in the source system (that send notifications to the service responsible for starting the integration).
Processed data scope - synchronization of a bulk data set can be split into full and recent changes (delta). The latter will download only the records modified since the last synchronization (based on the modification date recorded in the source system). Otherwise, it may be necessary to store additional information about synchronized records in an intermediate location (e.g. database - staging tables) and introduce additional logic to compare and filter the changes.
Let's use Dell Boomi as an example of how IPaaS platforms work. In our example, we create a simple integration that retrieves data from Salesforce and stores it to the database, performing a simple ETL process (Extract, Transform, Load).
1. Data Extraction
Data extraction requires establishing a connection with the source system and performing a query. Boomi provides connector components for this purpose. Its use consists of configuring connection data (URL, login, password, etc.) and the type of operation we want to perform. When defining the operation, we need to specify the data format that will be returned by the system. Here we can use the connector’s functionality for automatic import of this format from the system (by using the metadata API - this logic is located inside the connector).
2. Data Loading
Before we define data mapping (transformation), we will configure loading data to the target system by creating a database connector so that we can automatically import the data format of the target system (based on the database table schema), and then use it for mapping (without the need for creating it manually). Choosing a Database connector, we also define connection and operation performed on it. While configuring the operation, we can specify the type as Dynamic Insert. As a result, the appropriate SQL command that stores data will be generated automatically, based on the data format. Additionally, we can define the SQL command (or stored procedures) manually (still, in many cases, the automatic generation will be sufficient).
3. Data Transformation
To map data from the Salesforce format (XML) to the database statement format, we will use a map component. Mapping individual elements between selected formats in an intuitive way is possible using drag & drop. Of course, mapping will not always be simple 1:1. For more difficult cases, the map component provides additional functionalities, such as reference tables, operations on values, creation of own scripts, and more.
4. Defining Data Flow
Once all operations are ready, components must be connected to define the data flow (we do this in the main integration process). After completion, we are ready to run the process and integrate the data.
Integrations are never simple. Often, we need to define more complex data mapping, error and edge cases handling—combining data from various sources (e.g. from different API services), etc. However, this example shows how easy it is to build a working and extendible integration without in-depth knowledge. IPaaS-delivered components allow us to build high-level integration, without dealing with deep technical details, thus saving time and effort.
Let’s talk about how our global IPaaS team can help you facilitate your data integration and reap the benefits of Integration Platforms.