JSON Extraction Node

A JSON Extraction Node allows you to extract from a column that has a JSON value only the desired property. A very simple example of JSON syntax can be:

{ "name": "John", "age": 22, "gender": "male", }

In this article, you will learn how to explore values containing JSON within Y42, and how to work with JSON Extraction Node depending on whether you're using BigQuery or Snowflake.

How to analyze values as JSON at Y42?

In any table preview of the node editors, you can right-click on a cell and choose "Analyze Value as JSON". This will open a sidebar showing the value in JSON syntax:

820

How to use JSON Extraction Node if you're using BigQuery?

Understanding JSON Objects and arrays

JSON stands for JavaScript Object Notation, this is a syntax used for storing and exchanging data. There are two ways data can be stored in JSON, as objects or as arrays.

This is a JSON Object

{
   "object": "value"
}

This is a JSON Array

[ 
   "value"
]

Despite of their visual representations being very alike, JSON Objects use key/value pairs to store their data, this values can be returned by calling their property. Meanwhile, the order of the Array determines the order of the information displayed. In other words, Array items are accessed using indices.

It is important to note that those two methods of storing data can be combined. Examples:

This is how an Array of Objects would look like:

[{ “position”: “winner”,”country”: “U.S.A” }, 
  { “position”: “runner”,”country”: “China” }];
[{ “rainy”: true,”sunny”: false },
  { “rainy”: false,”sunny”: true }]

While this is the syntax for Object of arrays and objects

{ “binary”: [{“id”:1,”name”:0},{“id”:2,”name”:1}],
  “boolean”:  [{“id”:1,”value”: false},{“id”:2,”value”: true}]}

And here it's how Nested Objects would look like

“results”: { “tennis”: [{“position”: “winner”,“country”: “U.S.A},
            {“position”: “runner”,“country”: “China”}] }

Confused? It's ok! Watch the following video for more information and further explanation on this advanced node, or contact us for support.

JSON Extraction node explanation video

Note: In order to use the Model feature you already should have already imported data sources into Y42.

https://www.loom.com/share/04efe1c5cb8e4e36a6c72bc2830bc246

Setting up your JSON Extraction Node

  • Click on the JSON Extraction Node to apply the settings.

  • Select the Target Column you wish to modify or write the name of a new column.
  • Find the Source Column.
  • Choose the Column Type, e.g. Array of Nested Objects, Array of Objects, Nested Object, Object, Array.
  • Add your Property details.
  • You can easily add other columns you want to extract by clicking on Add Condition, the big + button below.
  • When you are done, press Save.

Once you're done extracting your desired information, you can either perform further transformations or, if you want to export the new data table, click Commit Model.

How to use JSON Extraction Node if you're using Snowflake?

JSON files are structured differently within Snowflake in comparison with BigQuery. Therefore, Column Type and Key Index fields are removed.

This article aims to help users with finding a way to extract desirable properties. For reference, the Snowflake documentation can be found here.

Usage Notes

  • The node returns a column with NULL values if the path name does not correspond to any element in the original JSON string.
  • The path name syntax is a standard JavaScript notation and consists of a concatenation of:
    • Identifiers (keys) preceded by dots (e.g. .) to traverse a path in a JSON object
      • Example: to refer to the 3rd level key: level1_key.level2_key.level3_key
    • The path notation also supports SQL-style double-quoted identifiers, and the use of : as path separators.
      • Example: level1_key:level2_key:level3_key
    • Index operators (e.g. [<index>]), where the index is a non-negative integer referring to the position of an element in an array.
      • Example: to get the 2nd element in an array [a, b, c], use [1] as the first element is [0].

Sandbox Example

Let's have a look at how notations described above are used in simple cases.

  • Target Column Value is an example of what your single cell of the Target Column may contain within the table.
  • Property is an example of a query that you can put into the field.
  • Example#1-#3 are outputs you may expect after applying a certain property.

Target Column ValueProperty: key_1Property: key_1.key_2key_1.key_2[1]
Example #1{"key_1": "value_1"}value_1NULLNULL
Example #2{"key_1": {"key_2": "value_2"}}{"key_2": "value_2"}value_2NULL
Example #3{"key_1": {"key_2": ["zero", "one", "two"]}}{"key_2": ["zero", "one", "two"]}["zero", "one", "two"]one

Real Example

Let's have a look at the real example of a JSON object a user had to face. Below, we will show how to retrieve several elements by using Property queries.

age_max: 61
age_min: 17
excluded_custom_audiences:
  0:
    id: "23737171315940370"
    name: "Success Request Call / Trial (14d)"
  1:
    id: "23737171315940370"
    name: "Success Request Call / Trial (180d)"
flexible_spec:
  0:
    interests:
      0:
        id: "7003352289131"
        name: "Business intelligence"
      1:
        id: "60052720291"
        name: "Data science"
  1:
    interests:
      0:
        id: "7003111485497"
        name: "E-commerce"
      1:
        id: "6001451153316"
        name: "E-Comm"
     work_employers:
      0:
        id: "16173255050557443"
        name: "E-commerce Specialist"

Note: we refer to 0 and 1 as elements of an array, not keys of an object. This is due to the structure of the file:

“age_max”:55,
“age_min”:18,
“excluded_custom_audiences”:
[{“id”:“23737171315940370”,“name”:“Success Request Call / Trial (14d)“},
{“id”:“23737171315940370",“name”:“All Website Visitors (14d)“}
Desirable ElementProperty (Query)
61 (age_max)age_max
23737171315940370 (excluded_custom_audiences → 1 → id)excluded_custom_audiences[1].id
Data science (flexible_spec → 0 → 1 → name)flexible_spec[0].interests[1].name

Additional Notes:

  1. For now, we do not support the extraction of repeating elements. That is, for every "id" inside an array of objects [{id:…},{id:…},{id:…}] users have to create new Properties.
  2. When the output is NULL or results in an error, try to cut the last references through dot or bracket notations until you will see the output. Then, try to add additional references step-by-step. This will help to understand where the syntax error is coming from.