Why is it used for
The Function Node is used to manipulate columns based on a transformation in functions/expressions similar to SQL.
The user can manipulate columns by providing a function/expression for those specific columns.
Functionality - What can it do
Let’s assume that we have this data input:
Example 1
Let’s suppose a scenario where the user wants to calculate the average profit for each car make unit. To do so, the user must add a Function node and use the numeric function DIVIDE() by giving it two parameters, the total_profit, and the total_units_sold. To reflect these changes, the user has two options:
- Select an existing column to overwrite
- Create a new column
In our scenario, we shall go for option two, where we need to create a new column to reflect changes and name it avg_profit_per_unit.
Note: The user can name his column in any way he wishes to as long as it doesn’t match existing columns, otherwise changes made shall be overwritten in those columns.
Example 2
In this example, the user wishes to use the car_model_year column for further manipulation, but at its current state, this column’s data type is Text, which is required to be converted into a DateTime. To do so, the user must add a Function node and use the DateTime function PARSE_DATETIME() by giving it two parameters, the car_model_year column and the DateTime format of choice.
To reflect these changes, the user is selecting car_model_year as a target column because this is the column in which he wants these changes to be applied.
As shown in the output image, the car_model_year _ column has its data type changed from text to **_DateTime**.
Example 3
This other example demonstrates the usage of Text functions. The user wishes to apply a text transformation on the car_make column by making all the data lowercase for better mapping. To do so, the user must add a Function node and use the Text function LCASE() by giving it only one parameter, the car_make column
To reflect these changes, the user is selecting car_make as a target column because this is the column in which he wants these changes to be applied.
As shown in the output image, all the data corresponding to the car_make column are set to lowercase.
The Function Node is versatile. These basic scenarios are a few of many ways the user can manipulate data as there are lots of other functions available categorized in:
- Number Functions: These are the mathematical/ arithmetical calculations in which you can add, subtract, divide, or multiply numeric fields. It is also possible to return the minimum (LEAST) or maximum (GREATEST) value for a specific field. Furthermore, you can also calculate the Ceil, Floor, Round, or the absolute value of specific fields.
- The floor value is a type of rounded value that gives you the nearest integer down. For example, the floor of 5.21 is 5.
- The ceiling value is a type of rounded value that gives you the nearest integer up. For example, the ceiling of 5.21 is 6.
- The round function returns the value of a number rounded to the nearest integer which can be either up or down. For example, the rounded value of 5.21 is 5.
- The absolute value often denoted as |x|, is the distance of a number from 0. For instance, the absolute value of 5 is 5, and the absolute value of -5 is also 5.
- Text Functions: These are functions used to manipulate text or string fields. A few examples are trimming text, changing it into upper or lower cases, replacing text, splitting, and so on.
- Boolean Functions: These are functions that follow certain binary conditions and return either True or False. You can use the other function types above to create very powerful and complex queries. A few examples of boolean functions or operators are AND, OR, NOT, GREATER_THAN, LESS_THAN, CONTAINS, etc.
- Date + Time Functions: Finally, there are the date and time-related functions. These allow you to subtract dates, truncate date values, and cast data types to convert it to a DateTime or timestamp, for example.
Another usage for the node is to join columns together or add extra text to a column, displaying them in a new column or overwriting an existing column. For example, you could create a new column "full_name" that concats the columns "name" and "surname". Likewise, you may just create a new column with the same values in every row by just typing the text in between quotation marks.
Note: To use the Model feature you already should have imported data sources into Y42.
How to set it up
To set it up the user must connect the Function node to any other node of choice
- Click on the Function node to apply the node settings. On this node, besides the interface for setting up your node, or creating your functions, you have access to the documentation for each available function on the right side.
- You can choose to create a new column by typing it into the _ **_Target Column** box or overwrite an existing column by selecting it from the menu.
- Choose the function or the combination of functions you want to apply to your data calling the columns by using the @ symbol.
- When you're done, you can check the transformation by clicking Output. If you're satisfied, click Save.
Once you're done extracting the minimum or maximum of your columns, you can either perform further transformations or, if you want to export the new data table, click Commit Model.