How to create an incremental Model
Don't forget the parentheses in your
is_incremental()
macro! Video uses the old version without parentheses.
What is Incremental modeling?
Incremental modeling is the process of importing and running only the new rows of data based on the conditions criteria. The new rows are appended to your already imported/created dataset.
Y42 provides you the ability to append new data in your pipeline in a faster way continuously.
Why use incremental modeling?
Let's suppose you have millions of records in your Paypal integration. You use a SQL model to model those raw data inputs and create your desired report.
Every day new records are added to your Paypal records, so you build an Orchestration and schedule it to run every day, to generate those new records in the Paypal import and then to run the query of the SQL model.
Doing this process costs because you are writing the data in your warehouse and the process takes time also.
Here is when Incremental models come to the rescue.
By putting a condition like for example created_at
> max(created_at)
you will run the query only for the rows that satisfy the matching criteria and then append the new rows to the old dataset of the model.
Please note that incremental modeling will only append data that is matching the condition!
How to create an incremental model?
To create an incremental model you need first to know how to create a simple SQL model, the process is the same.
Example case:
We have initially this raw input data for which we want to calculate the sum of debt grouped by day, where debt is less than 1000 and before 31/12/2022.
I. Create the SQL query in SQL models.
selectÂ
  `created_at`,
  round(sum(`debt`), 2) as sum_debt
from {{ ref('src/Models/SQL/incremental_input/table_1') }}
where `debt` < 1000 and `created_at`< '2022-12-31'
group by 1
order by 1
The result in the SQL model will be:
II. Decide which condition criteria you will use for the incremental models.
The next two days your integration is enriched with 3 new rows:
For this example we will use the created_at
column (as you might notice it could be as well the id
column as a valid condition criteria).
III. Place the condition at the end of the SQL model query.
In our SQL model we simply add the condition {% if is_incremental() %}
, using the supported Jinja templating at the Y42 code editor, at the end of the query, and add the condition inside the tags.
Important note:
When you have more than one condition (as in the example), you need to use the
and
operator inside the incremental clause, and place all other conditions above the incremental clause.Alternatively, you can either wrap your model query in a CTE or prepend all conditions with a placeholder
1=1
condition, which allows the concatenation of all following conditions using the and operator.
{% if is_incremental() %}
    where `created_at` > (SELECT max(created_at) from {{ this }})
{% endif %}
The whole SQL model will look like this:
selectÂ
  `created_at`,
  round(sum(`debt`), 2) as sum_debt
from {{ ref('src/Models/SQL/incremental_raw/table_1') }}
where `debt` < 1000 and `created_at`< '2022-12-31'
{% if is_incremental() %}
and `created_at` > (SELECT max(created_at) from {{ this }})
{% endif %}
group by 1
order by 1
Another option is to encapsulate your SQL model logic within a CTE and append the incremental clause to the outer query:
with model_logic as
(
selectÂ
  `created_at`,
  round(sum(`debt`), 2) as sum_debt
from {{ ref('src/Models/SQL/incremental_raw/table_1') }}
where `debt` < 1000 and `created_at`< '2022-12-31'
)
select *
from model_logic
{% if is_incremental() %}
where `created_at` > (select max(created_at) from {{ this }})
{% endif %}
group by 1
order by 1
The incremental Jinja is a replacement that gets added when is_incremental()
returns true. Therefore, the table must be imported incrementally!
You need to trigger the SQL model incrementally for it to work! If you fully trigger it, the changes we made will not have any effect.
IV. Commit the model
For the changes to take effect as always we need to commit the changes we made.
The first import will always be a Full import!
V. Trigger the SQL model Incrementally.
Now that the changes are loaded you can try the Incremental modeling.
You can trigger the SQL modeling manually or with an orchestration.
- Triggering the SQL model manually
To trigger the SQL model manually you need to go to the Jobs section of the model and simply click the Incremental import button.
- Triggering the SQL model via an Orchestration
You can trigger an incremental model via an Orchestration. To set up an orchestration read this article.
The only thing to have in consideration while setting up the orchestration is that it has to be set as an incremental import!
Common questions and answers for Incremental modeling
In the incremental modeling, are we only doing appending or are we also doing merge right now?
Answer: We append only at the moment, but we have the merge operation on the roadmap!
When I switch on incremental modeling on a node, is it enough or do I need to also write is_incremental()
in my SQL?
is_incremental()
in my SQL?Answer: You need to write is incremental
in your SQL model.
How do I do incremental UI modeling?
Answer: At the moment we do not do incremental modeling in UI model (only in SQL model), but we will implement it in the future.
What if, in a SQL Node, I trigger full import manually, but write is_incremental()
in the query?
is_incremental()
in the query?Answer: In that case, a full import will be triggered!
What if in a SQL Node, I switch on incremental import on UI, but don’t write is_incremental()
in the query?
is_incremental()
in the query?Answer: In that case, a full import will be triggered! You need to write is_incremental()
in your SQL model, for the incremental model to be valid!
Updated over 1 year ago