Directory tables in Snowflake are a powerful feature designed to manage and catalog files stored in stages, making it easier to work with unstructured and semi-structured data during data loading and unloading operations.
A directory table is essentially a built-in, read-only table that automatica…Directory tables in Snowflake are a powerful feature designed to manage and catalog files stored in stages, making it easier to work with unstructured and semi-structured data during data loading and unloading operations.
A directory table is essentially a built-in, read-only table that automatically catalogs all files contained within a stage. When you enable a directory table on a stage, Snowflake creates metadata entries for each file, including important attributes such as the file path, size, last modified timestamp, and an MD5 checksum for data integrity verification.
To enable a directory table, you can set the DIRECTORY parameter to TRUE when creating or altering a stage. Once enabled, you can query the directory table using the DIRECTORY() table function, which returns comprehensive file metadata. This allows you to programmatically discover and filter files before loading them into Snowflake tables.
Directory tables are particularly useful for several scenarios. First, they help you identify new or modified files in a stage, enabling incremental data loading patterns. Second, they support data lake architectures where files are continuously added to cloud storage locations. Third, they facilitate the management of unstructured data like images, PDFs, and audio files that need to be processed or analyzed.
The metadata in directory tables must be refreshed to reflect changes in the stage. You can manually refresh using the ALTER STAGE command with REFRESH option, or configure automatic refresh for stages pointing to cloud storage locations that support event notifications.
Key columns available in directory tables include RELATIVE_PATH for the file location, SIZE for file dimensions in bytes, LAST_MODIFIED for timestamp information, MD5 for checksum values, and ETAG for cloud provider-specific identifiers.
Understanding directory tables is essential for the SnowPro Core exam as they represent a modern approach to file management and are integral to building robust data pipelines in Snowflake.
Directory Tables in Snowflake: Complete Guide
What are Directory Tables?
Directory tables are a special type of table in Snowflake that automatically catalogs staged files and their metadata. They are created and associated with external or internal stages, providing a queryable interface to discover files stored in those stages.
Why Directory Tables are Important
Directory tables solve several critical challenges:
• File Discovery: They allow you to query and discover files in a stage using standard SQL • Metadata Access: Provide access to file metadata such as size, last modified date, and file URL • Data Lake Integration: Enable Snowflake to function as a data lake query engine by cataloging external data • Automation: Support building automated data pipelines by programmatically identifying new or changed files
How Directory Tables Work
1. Enabling Directory Tables: When creating a stage, set DIRECTORY = (ENABLE = TRUE)
2. Automatic Cataloging: Snowflake catalogs files in the stage, capturing metadata
3. Manual Refresh: Use ALTER STAGE stage_name REFRESH to update the directory table with new files
4. Auto-Refresh Option: For external stages on cloud storage, AUTO_REFRESH can be enabled for automatic updates via event notifications
Key Metadata Columns
• RELATIVE_PATH: The path to the file relative to the stage • SIZE: File size in bytes • LAST_MODIFIED: Timestamp of last modification • MD5: MD5 hash of the file (when available) • FILE_URL: Scoped URL to access the file
Querying Directory Tables
Use the DIRECTORY() function: SELECT * FROM DIRECTORY(@my_stage);
You can filter, join, and perform standard SQL operations on the results.
Use Cases
• Building external tables on staged data • Identifying files for COPY INTO operations • Monitoring file ingestion progress • Creating data catalogs for unstructured data • Generating pre-signed URLs for file access
Exam Tips: Answering Questions on Directory Tables
1. Remember the Syntax: Questions often test whether you know to use DIRECTORY() function with the stage name prefixed by @
2. Refresh Behavior: Know that directory tables require manual refresh using ALTER STAGE REFRESH unless AUTO_REFRESH is configured
3. Stage Association: Directory tables are always associated with a stage - they cannot exist independently
4. Metadata Columns: Be familiar with the columns returned - especially RELATIVE_PATH and FILE_URL
• Which command refreshes a directory table? (ALTER STAGE stage_name REFRESH) • How do you query files in a directory table? (SELECT FROM DIRECTORY(@stage_name)) • What metadata is available? (Size, path, last modified, file URL) • What is required for AUTO_REFRESH? (Cloud event notification service integration)