The FLATTEN function in Snowflake is a powerful table function designed to handle semi-structured data by converting nested arrays and objects into a relational format that can be easily queried and analyzed.
When working with JSON, ARRAY, or VARIANT data types, you often encounter hierarchical st…The FLATTEN function in Snowflake is a powerful table function designed to handle semi-structured data by converting nested arrays and objects into a relational format that can be easily queried and analyzed.
When working with JSON, ARRAY, or VARIANT data types, you often encounter hierarchical structures where data is nested within arrays or objects. The FLATTEN function takes these complex nested elements and expands them into separate rows, making the data accessible for standard SQL operations.
The basic syntax is: SELECT * FROM TABLE(FLATTEN(input => <column_name>)). The function accepts several parameters including INPUT (the data to flatten), PATH (specifies which element to flatten), OUTER (includes rows even when the input is empty), RECURSIVE (flattens nested structures at all levels), and MODE (specifies whether to flatten objects, arrays, or both).
When FLATTEN processes data, it returns several useful columns: SEQ (sequence number), KEY (key name for objects), INDEX (array position), VALUE (the actual flattened value), PATH (path to the element), and THIS (the element being flattened).
For example, if you have a JSON column containing an array of order items, FLATTEN will create one row per item in that array. This enables you to join this expanded data with other tables, perform aggregations, or apply filters to individual array elements.
The OUTER parameter is particularly valuable when you need to preserve parent rows even if the nested array is empty or null, similar to a LEFT OUTER JOIN behavior.
The RECURSIVE option allows you to flatten deeply nested structures in a single operation, traversing through multiple levels of hierarchy.
Understanding FLATTEN is essential for the SnowPro Core exam as it demonstrates proficiency in handling modern data formats and transforming semi-structured data into queryable results within Snowflake's architecture.
FLATTEN Function for Nested Data in Snowflake
Why FLATTEN is Important
The FLATTEN function is essential for working with semi-structured data in Snowflake, such as JSON, AVRO, and Parquet files. Modern data pipelines frequently ingest nested and hierarchical data structures, and FLATTEN allows you to transform these complex structures into relational rows that can be queried using standard SQL. This is a critical skill tested on the SnowPro Core exam.
What is FLATTEN?
FLATTEN is a table function that produces a lateral view of a VARIANT, OBJECT, or ARRAY column. It converts nested elements into separate rows, essentially 'exploding' arrays and objects into a tabular format. Each element in the nested structure becomes its own row in the output.
How FLATTEN Works
The basic syntax is:
SELECT * FROM table_name, LATERAL FLATTEN(input => column_name)
Key parameters include: - INPUT: The VARIANT, OBJECT, or ARRAY column to flatten - PATH: Optional path to a nested element within the input - OUTER: When TRUE, returns rows even when input is empty (similar to outer join) - RECURSIVE: When TRUE, flattens nested structures at all levels - MODE: Can be OBJECT, ARRAY, or BOTH (default is BOTH)
FLATTEN returns several columns: - SEQ: A unique sequence number - KEY: The key for objects, or index for arrays - PATH: The path to the element - INDEX: The index for array elements (NULL for objects) - VALUE: The value of the element - THIS: The element being flattened
Common Use Cases
1. Flattening JSON arrays: Extract individual items from an array stored in a VARIANT column 2. Accessing nested objects: Navigate through multiple levels of JSON hierarchy 3. Converting semi-structured to structured data: Transform JSON data into standard relational tables
Example
For a table with JSON data containing an array of orders: SELECT f.value:order_id::STRING as order_id, f.value:amount::NUMBER as amount FROM customers, LATERAL FLATTEN(input => customer_data:orders) f
Exam Tips: Answering Questions on FLATTEN
1. Remember LATERAL: FLATTEN is typically used with LATERAL keyword to correlate with the base table
2. Know the output columns: Exam questions often ask about SEQ, KEY, PATH, INDEX, VALUE, and THIS columns
3. Understand OUTER parameter: When OUTER is TRUE, rows with empty arrays or NULL values are preserved in the output. When FALSE (default), these rows are excluded
4. RECURSIVE vs non-recursive: Know that RECURSIVE TRUE will flatten all nested levels, while the default only flattens one level
5. MODE parameter: Questions may test whether you understand OBJECT mode (only objects), ARRAY mode (only arrays), or BOTH
6. Path notation: Be comfortable with dot notation and bracket notation for accessing nested elements (e.g., data:customer:address vs data['customer']['address'])
7. Casting output: Remember that VALUE returns VARIANT type, so you typically need to cast it (::STRING, ::NUMBER, etc.)
8. Multiple FLATTENs: For deeply nested structures, you may need to chain multiple FLATTEN operations
9. Performance consideration: FLATTEN can generate many rows from a single input row, which impacts query performance