Integrations in Y42
In Y42, Integrations allow for the extraction of data from various data sources and subsequently loading this data into the data warehouse. This is done by using connectors that manage the authentication of user accounts into applications or databases, selection of specific tables and columns, and finally the import of data from the source into the destination data warehouse.
Setting up integrations is usually the first step to unlocking your data journey within Y42.
Every execution inside Y42 is considered a job. You can see more about this concept here.
For integrations, the act of a connector to extract data from a data source and load it into a data warehouse is considered an import.
Import Types
When setting up a new integration, a full Import is required to enable the Integration to be used. During this step, the integration will load data from the selected tables available on your Source, from the start date until the current day, and it will load it into the designated destination.
After the first successful full import, it's possible to trigger manually, or schedule in the Orchestration, another full import or an incremental import. Incremental import allows only data that has been modified or added to be extracted and loaded, on a pre-defined schedule.
In some cases, it may be needed to re-run a full import to fix a data integrity error or a schema change. This process is called a re-import.
Start date
The start date is the starting point in time the integration will extract historical data from the Data source. There are some APIs limiting the timeframe available to load.
Full Import
Full imports update all tables from the historical date until the current day.
During a full import, the integration gets all data from the start date to the current day of the source and copies it into the Destination. Full imports include all your selected data.
How long the imports takes depends on the amount of data and the limitations of the source.
Incremental Import
This import enables you to extract and load new and modified data from your source into your destination. Incremental imports are efficient because they update only new information, instead of re-importing the whole table again.
Incremental Import for Applications
To make an import be incremental, the source table has to have a suitable replication key. Some applications provide a cursor field such as updated_at
or created_at
, but sometimes, it's only possible to find a primary key, such as a task_id
.
The existence of a replication key and its type is determined by the App, and will influence if the incremental import works and if it will replicate only new records or updated as well.
Y42 also support Change Data Capture for databases, a strong feature to replicate all your data in near real-time.
Incremental Import for Databases
Log-based replication uses the transaction logs that some databases, implement natively as part of their core functionality.
The replication tool looks for logs to identify changes in a database, such as INSERT, UPDATE, or DELETE operations to replicate its data.
Check our database documentation for coverage of CDC and Incremental import.
When log-based CDC is not possible, The standard incremental import for databases will use the key-based method.
Key-Based Incremental Data Replication
Key-based incremental is a method in which the data sources identify new and updated data using the column called the Replication Key. A key can be a timestamp, integer, or datestamp column that exists in the source table. Such as task_id
, insert_at
,updated_at
etc.
Note: Having an auto-incrementing integer as an ID selected as a replication key, may cause the replication to not capture updated rows, because the ID will remain the same.
Re-Import
A re-import is nothing more than a full import, for an existing table. Running a full import again will completely overwrite the data in your destination from your Datasource.
It is used to fix data integrity issues in selected tables, caused when historical data is changed and you cannot replicate those changes back into your data warehouse. When this occurs, your table is out of sync, and reimporting it will solve the issue.
Integration features
Each integration in Y42 can have some of the capabilities described in the upcoming paragraphs. The capabilities of an integration are detailed in its respective guide, or in the integration selection menu within the app.
Full import
A full import consists of the replication of all data from your data source into your data warehouse. This import will bring all data from the tables you selected from the starting day until today.
Incremental Import
Incremental imports enables importing only newly added data into the data warehouse, without the need to import all historical data again.
There is some integration in which the incremental import will load only new data from the source. For some of them, is also possible to replicate changes, data that was already imported but changed in the data source.
Y42 names this ability to load changes in the data as Retroactive Updating, and you can check in The Reference documentation the ones we support it
Retroactive Updating
Retroactive updates are the ability to recognize historical updates in data and replicate this information from your data source into your data warehouse. This functionality depends on the underlying data source & API. Please check out the reference documentation of the respective integration to learn more of the supported functionalities.
Start Date Selection
Some APIs allow the definition of the timeframe of historical data to be loaded, and others have it pre-defined. Y42 enables this configuration every time it's available. Some limitation on how far this selection can go also varies from each data source.
Dynamic Column Selection
The ability to specify the columns to import from a specific table. It may depend on the API and on each table to have this option enabled or not.
Custom Data
As part of complete schema replication, replicates custom data whenever it exists and is accessible. Not all sources that have custom data expose it in a way we can access.
Custom data includes custom objects, tables, and fields that you have configured in the source system to better suit your business needs. Custom objects are specific to your source, for example, custom Salesforce objects that match your business process.
There is no special action you need to take to make sure we replicate your custom data. It will happen automatically for the systems that enable it.
Templates
Templates are pre-defined selections of schemas of a table. Designed to cover use cases for analytical integration, which allows you to select different combinations of columns to produce different reports in the same table, Templates auto-select it.
For a comprehensive list of the templates we support, refer to the guide.
Updated about 2 years ago