Data tests is a powerful tool offered by Y42, in order to assure the quality of data.
We all know that data coming from different sources through the pipeline contain the risk of being faulty and errors are very difficult to spot especially if there are thousands and millions of rows.
This is were Data tests come in the rescue. You can set them up easy through the UI of Y42. They can run:
- Manually or every time directly in the table after you set them up.
- Automatically when a table containing the tests is triggered from an Orchestration (or when the Table containing the tests is triggered manually for a full or incremental import).
Pro tip: you can use data test not only to check the quality of the data but also to make business decisions!
Note: The data Test button will not be clickable if you do not have at least 1 successful Job
First you can do is add the schema of the tests, which sets the first conditions under which the actual tests are build.
This will be your first mandatory test!
In the schema you can:
- Add/Remove the Columns you will test further.
- Select the Column type (By default it will be mapped the actual column type).
- Select if you accept null values of them
- Select if these columns are required or not (Should or should not exist in the actual output table)
- Allow new columns to be added in the output table or not.
Note: You can change this setup anytime later.
You have the possibility to preview the actual table data in the UI of the test setup making it easier to make the necessary decisions while sitting up the tests.
Click the run button and the Table will be checked for the conditions you setup.
If everything matches then the test will be successful.
If I change one condition intentionally, like for the first column in the image from Float type into String and run it, the test will fail.
After you click the save button the first test (Schema test) will be saved.
Now you can add other types of tests.
Now you can add ass many tests as you want.
In this step you can add:
- Column tests
- Row number Drift
- Unique Key
- Low-Code Condition
To make the test run every time the table is imported/triggered, you need to run an orchestration to make the process automatic and reliable.
In this section are listed all the statuses of the test runs.
Updated about 1 year ago