File format objects in Snowflake are named database objects that define the format specifications for data files used during loading and unloading operations. These objects encapsulate all the configuration settings needed to properly parse incoming data or structure outgoing data, making data oper…File format objects in Snowflake are named database objects that define the format specifications for data files used during loading and unloading operations. These objects encapsulate all the configuration settings needed to properly parse incoming data or structure outgoing data, making data operations more efficient and reusable.
Snowflake supports several file format types including CSV, JSON, AVRO, ORC, PARQUET, and XML. Each format type has specific options that control how data is interpreted. For CSV files, you can specify field delimiters, record delimiters, skip header rows, null string representations, and escape characters. For semi-structured formats like JSON, you can configure strip outer array settings and date/time format specifications.
File format objects are created using the CREATE FILE FORMAT command and can be stored at the database or schema level. Once created, they can be referenced in COPY INTO statements, stage definitions, and external table configurations. This eliminates the need to repeatedly specify format options in each data operation.
Key benefits of using file format objects include consistency across multiple load operations, simplified SQL statements, easier maintenance when format specifications change, and the ability to share format definitions across teams. When you modify a file format object, all future operations referencing it will use the updated specifications.
You can associate file format objects with stages, which means any data operation using that stage will automatically apply the format settings. Alternatively, you can specify the file format at the time of the COPY command, which will override any stage-level format settings.
File format objects support various compression options including GZIP, BZ2, BROTLI, ZSTD, and DEFLATE. Snowflake can also auto-detect compression when loading data. Understanding file format objects is essential for the SnowPro Core exam as they form a fundamental component of Snowflakes data loading and unloading architecture.
File Format Objects in Snowflake: Complete Guide
What Are File Format Objects?
File format objects in Snowflake are named database objects that define the structure and properties of data files used during data loading and unloading operations. They act as reusable templates that specify how Snowflake should parse incoming files or generate outgoing files.
Why Are File Format Objects Important?
File format objects are essential for several reasons:
• Reusability: Define once and use across multiple COPY commands and stages • Consistency: Ensure uniform data handling across your organization • Simplification: Reduce complexity in COPY statements by referencing a single object • Maintenance: Update format settings in one place rather than multiple scripts • Error Reduction: Minimize mistakes by centralizing configuration
Supported File Format Types
Snowflake supports the following file format types:
• CSV - Comma-separated values (default) • JSON - Semi-structured JSON data • AVRO - Apache Avro format • ORC - Optimized Row Columnar format • PARQUET - Apache Parquet columnar format • XML - Extensible Markup Language
How File Format Objects Work
Creating a File Format:
CREATE FILE FORMAT my_csv_format TYPE = 'CSV' FIELD_DELIMITER = ',' SKIP_HEADER = 1 NULL_IF = ('NULL', 'null') FIELD_OPTIONALLY_ENCLOSED_BY = '"';
Using in COPY Command:
COPY INTO my_table FROM @my_stage FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Using Inline Options:
COPY INTO my_table FROM @my_stage FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = '|');
Key CSV Format Options
• FIELD_DELIMITER: Character separating fields (default: comma) • RECORD_DELIMITER: Character separating records (default: newline) • SKIP_HEADER: Number of header lines to skip • FIELD_OPTIONALLY_ENCLOSED_BY: Character enclosing strings • NULL_IF: Strings to interpret as NULL • EMPTY_FIELD_AS_NULL: Treat empty fields as NULL • ERROR_ON_COLUMN_COUNT_MISMATCH: Error if column count differs • COMPRESSION: Compression type (AUTO, GZIP, BZ2, etc.)
Key Semi-Structured Format Options (JSON, Parquet, etc.)
• STRIP_OUTER_ARRAY: Remove outer array brackets from JSON • STRIP_NULL_VALUES: Remove NULL values from objects • DATE_FORMAT/TIME_FORMAT: Specify date and time formats • BINARY_FORMAT: Format for binary data (HEX, BASE64, UTF8)
File Format Object Scope and Ownership
• File formats are schema-level objects • They require a schema for creation • Ownership determines who can modify or drop the object • USAGE privilege allows referencing the format
Managing File Formats
• ALTER FILE FORMAT: Modify existing format properties • DROP FILE FORMAT: Remove a file format object • SHOW FILE FORMATS: List available file formats • DESC FILE FORMAT: View format definition
Exam Tips: Answering Questions on File Format Objects
1. Remember the hierarchy: File formats are schema-level objects, not database or account level
2. Know the six types: CSV, JSON, AVRO, ORC, PARQUET, and XML - memorize these
3. Default format is CSV: If no TYPE is specified, Snowflake assumes CSV
4. Inline vs Named: Understand that format options can be specified inline in COPY commands or through named file format objects
5. SKIP_HEADER is CSV-specific: This option only applies to CSV format types
6. STRIP_OUTER_ARRAY: Remember this is crucial for loading JSON arrays as separate rows
7. Compression AUTO: Snowflake can automatically detect compression when set to AUTO
8. Format precedence: Options specified in the COPY command override those in the named file format
9. Watch for trick questions: File formats define how to read data, not where data is located (that is the stage's job)
10. Semi-structured data: JSON, AVRO, ORC, and PARQUET are loaded into VARIANT columns or can be transformed during load