IAM and Organization Policies for Data Systems
Identity and Access Management (IAM) and Organization Policies are critical components for securing and governing data systems in Google Cloud Platform (GCP). **IAM (Identity and Access Management):** IAM enables fine-grained access control by defining **who** (identity) has **what access** (role)… Identity and Access Management (IAM) and Organization Policies are critical components for securing and governing data systems in Google Cloud Platform (GCP). **IAM (Identity and Access Management):** IAM enables fine-grained access control by defining **who** (identity) has **what access** (role) to **which resource**. It follows the principle of least privilege, ensuring users and services only have permissions necessary for their tasks. Key concepts include: - **Members:** Users, service accounts, groups, or domains that need access. - **Roles:** Collections of permissions. These include Basic roles (Owner, Editor, Viewer), Predefined roles (e.g., BigQuery Data Editor, Storage Admin), and Custom roles for tailored permissions. - **Policies:** Bindings that attach roles to members at various resource hierarchy levels (organization, folder, project, or individual resources). - **Service Accounts:** Special accounts used by applications and VMs to authenticate and interact with GCP APIs programmatically. For data systems, IAM controls access to BigQuery datasets, Cloud Storage buckets, Pub/Sub topics, Dataflow jobs, and more. Column-level and row-level security in BigQuery further enhances data protection. **Organization Policies:** Organization Policies provide centralized, top-down governance constraints across the resource hierarchy. Unlike IAM (which grants access), Organization Policies **restrict** what configurations are allowed. Key features include: - **Constraints:** Rules such as restricting resource locations (e.g., data must stay in specific regions), disabling public access to Cloud Storage buckets, or enforcing uniform bucket-level access. - **Inheritance:** Policies cascade down the hierarchy from organization to folders to projects, ensuring consistent compliance. - **Common data-related policies:** Restricting external data sharing in BigQuery, enforcing encryption standards, preventing public datasets, and controlling VPC Service Perimeter configurations. **Together in Data Systems:** IAM and Organization Policies complement each other. IAM manages granular access permissions for individuals and services, while Organization Policies enforce broad security guardrails across the entire organization. Combined with VPC Service Controls and Data Loss Prevention (DLP), they form a comprehensive data governance framework essential for regulatory compliance and data protection.
IAM & Organization Policies for Data Systems – GCP Professional Data Engineer Guide
Why IAM and Organization Policies for Data Systems Matter
In any enterprise cloud environment, securing data is paramount. Identity and Access Management (IAM) and Organization Policies form the foundational security layer that governs who can access what data resources, how they can interact with them, and what constraints are enforced across the entire organization. For a Professional Data Engineer, understanding these concepts is critical because:
• Data pipelines often span multiple projects, services, and teams — each requiring precise access control.
• Regulatory compliance (GDPR, HIPAA, PCI-DSS) demands strict governance over data access and residency.
• Misconfigurations in IAM are one of the leading causes of cloud data breaches.
• The exam heavily tests your ability to design secure, least-privilege data architectures.
What Is IAM in Google Cloud?
Google Cloud IAM allows you to manage access control by defining who (identity) has what access (role) for which resource. The core components are:
1. Members (Identities)
These are the entities that can be granted access:
• Google Account – A single user (e.g., user@example.com)
• Service Account – An identity for applications and compute workloads (e.g., my-sa@project-id.iam.gserviceaccount.com)
• Google Group – A collection of users and service accounts
• Google Workspace Domain – All users in an organization's domain
• Cloud Identity Domain – All users managed via Cloud Identity
• allAuthenticatedUsers – Any authenticated Google account
• allUsers – Anyone on the internet (use with extreme caution)
2. Roles
Roles are collections of permissions. GCP offers three types:
• Basic Roles (Primitive): Owner, Editor, Viewer – These are broad and generally not recommended for production data systems because they grant too many permissions.
• Predefined Roles: Fine-grained roles created by Google for specific services (e.g., roles/bigquery.dataViewer, roles/storage.objectAdmin, roles/dataflow.worker).
• Custom Roles: User-defined roles that bundle specific permissions when predefined roles don't match your exact requirements.
3. Policies
An IAM policy is a binding of one or more members to a role, attached to a resource. Policies are inherited down the resource hierarchy:
Organization → Folder → Project → Resource
A policy set at the organization level is inherited by all folders, projects, and resources beneath it. Policies are additive — you cannot remove an inherited permission at a lower level (with the exception of deny policies).
4. IAM Deny Policies
Deny policies allow you to explicitly deny specific permissions regardless of the allow policies. They take precedence over allow policies and are useful for creating guardrails (e.g., preventing anyone from deleting a critical BigQuery dataset).
5. IAM Conditions
Conditional role bindings allow you to grant access only when certain conditions are met, such as:
• Time-based access (e.g., access only during business hours)
• Resource attribute-based conditions (e.g., only for resources with specific tags or names)
• IP-based restrictions (when used with Access Context Manager)
Key IAM Concepts for Data Services
BigQuery IAM
• roles/bigquery.dataViewer – Read access to data in datasets
• roles/bigquery.dataEditor – Read/write access to data
• roles/bigquery.dataOwner – Full control over datasets including managing access
• roles/bigquery.jobUser – Permission to run jobs (queries) in a project
• roles/bigquery.user – Can run jobs and create datasets
• roles/bigquery.admin – Full control over all BigQuery resources
• BigQuery supports dataset-level, table-level, column-level, and row-level security.
• Authorized Views allow you to share query results without giving users access to underlying tables.
• Column-level security uses policy tags from Data Catalog to restrict access to specific columns.
• Row-level security uses row access policies (filter expressions) to control which rows a user can see.
Cloud Storage IAM
• roles/storage.objectViewer – Read objects
• roles/storage.objectCreator – Create objects (but not read or delete)
• roles/storage.objectAdmin – Full control over objects
• roles/storage.admin – Full control over buckets and objects
• Cloud Storage also supports ACLs (Access Control Lists) for object-level granularity, but uniform bucket-level access is recommended for simplicity and consistency with IAM.
Cloud Pub/Sub IAM
• roles/pubsub.publisher – Publish messages to a topic
• roles/pubsub.subscriber – Consume messages from a subscription
• roles/pubsub.viewer – View topics and subscriptions
Dataflow / Dataproc IAM
• roles/dataflow.developer – Create and manage Dataflow jobs
• roles/dataflow.worker – Assigned to the service account used by Dataflow worker VMs
• roles/dataproc.editor – Create and manage Dataproc clusters and jobs
• Service accounts used by Dataflow and Dataproc workers need permissions to access source/sink resources (e.g., GCS buckets, BigQuery datasets, Pub/Sub topics).
Cloud Composer IAM
• The Composer environment's service account needs access to all resources orchestrated by the DAGs.
• Follow least privilege — grant only the specific roles needed for each pipeline component.
Service Accounts Best Practices for Data Pipelines
• Use dedicated service accounts per pipeline or workload instead of the default compute service account.
• Grant only the minimum required roles.
• Use service account impersonation (via roles/iam.serviceAccountTokenCreator) instead of downloading keys.
• Avoid using service account keys whenever possible — prefer Workload Identity, attached service accounts, or federated identity.
• Rotate keys if they must be used, and store them in Secret Manager.
What Are Organization Policies?
Organization Policies are constraints set at the organization, folder, or project level that define what configurations are allowed across your GCP environment. They are complementary to IAM — while IAM controls who can do things, Organization Policies control what can be done regardless of IAM permissions.
Key Organization Policy Constraints for Data Systems:
1. Resource Location Restriction (constraints/gcp.resourceLocations)
• Restricts where resources can be created (e.g., only in US or EU regions).
• Critical for data residency and sovereignty requirements (GDPR, etc.).
• Example: Ensure all BigQuery datasets and GCS buckets are created only in europe-west1.
2. Uniform Bucket-Level Access (constraints/storage.uniformBucketLevelAccess)
• Enforces uniform bucket-level access on all new Cloud Storage buckets, preventing the use of legacy ACLs.
3. Disable Service Account Key Creation (constraints/iam.disableServiceAccountKeyCreation)
• Prevents users from creating service account keys, reducing the risk of credential leakage.
4. Domain Restricted Sharing (constraints/iam.allowedPolicyMemberDomains)
• Ensures IAM policies can only grant access to members within specified domains.
• Prevents accidental sharing of data with external users.
5. Require OS Login (constraints/compute.requireOsLogin)
• Relevant for Dataproc clusters — ensures SSH access is managed through IAM rather than SSH keys.
6. Disable Serial Port Access (constraints/compute.disableSerialPortAccess)
• Prevents serial port access on VMs, relevant for Dataproc and Dataflow workers.
7. VPC Service Controls (used in conjunction with Organization Policies)
• Creates security perimeters around GCP services to prevent data exfiltration.
• Can restrict BigQuery, GCS, Pub/Sub, and other services to only communicate within defined perimeters.
• Service Perimeters define which projects and services are inside the boundary.
• Access Levels define conditions under which external access is allowed (e.g., from specific IP ranges or identities).
• Ingress/Egress Rules define allowed cross-perimeter communications.
8. CMEK Organization Policy (constraints/gcp.restrictNonCmekServices)
• Enforces that certain services must use Customer-Managed Encryption Keys (CMEK).
• Relevant for BigQuery, GCS, Pub/Sub, Dataflow, and Dataproc.
How IAM and Organization Policies Work Together
Think of security as layers:
1. Organization Policies define the guardrails — what is possible in your environment (e.g., data must stay in the EU, no service account keys).
2. IAM defines who can access specific resources and what they can do.
3. VPC Service Controls prevent data from leaving defined perimeters, even if IAM allows access.
4. Data-level security (column-level, row-level, authorized views) provides the finest granularity within services like BigQuery.
For a request to succeed, it must pass ALL layers:
• The organization policy must allow the action/configuration.
• IAM must grant the required permission to the identity.
• VPC Service Controls must permit the request's origin.
• Data-level security must allow access to the specific data being requested.
The Principle of Least Privilege in Data Systems
This is a foundational concept that appears repeatedly on the exam:
• Grant only the minimum permissions needed to perform a task.
• Prefer predefined roles over basic roles.
• Use custom roles when predefined roles grant more permissions than needed.
• Apply roles at the narrowest scope possible (resource-level > project-level > folder-level > org-level).
• Use groups to manage access instead of granting roles to individual users.
• Regularly audit IAM policies using IAM Recommender and Policy Analyzer.
• Use IAM Recommender to identify and remove excess permissions.
Common Exam Scenarios
Scenario 1: Cross-project data access
A Dataflow pipeline in Project A needs to read from a BigQuery dataset in Project B and write to a GCS bucket in Project C.
→ The service account running the Dataflow job needs roles/bigquery.dataViewer and roles/bigquery.jobUser in Project B, and roles/storage.objectCreator in Project C.
Scenario 2: Data residency requirement
An organization requires all data to remain within the EU.
→ Set the constraints/gcp.resourceLocations organization policy to only allow EU locations. Use VPC Service Controls to prevent data exfiltration. Create BigQuery datasets with EU multi-region or specific EU region locations.
Scenario 3: Preventing external data sharing
Ensure no one can accidentally share BigQuery data with external users.
→ Use constraints/iam.allowedPolicyMemberDomains to restrict IAM bindings to your organization's domain. Implement VPC Service Controls with a service perimeter around BigQuery.
Scenario 4: Sensitive column restriction
A BigQuery table contains PII that should only be accessible by the data governance team.
→ Use column-level security with Data Catalog policy tags. Grant the roles/datacatalog.categoryFineGrainedReader role for the relevant policy tag to only the governance team members.
Scenario 5: Temporary access for a contractor
A contractor needs read access to a BigQuery dataset for two weeks.
→ Use IAM Conditions with a time-based expression to automatically expire the roles/bigquery.dataViewer role binding after two weeks.
Exam Tips: Answering Questions on IAM and Organization Policies for Data Systems
Tip 1: Always Choose Least Privilege
When multiple answers could work, choose the one that grants the fewest permissions. If one option uses roles/bigquery.dataViewer and another uses roles/bigquery.admin, choose the viewer role unless admin capabilities are explicitly needed.
Tip 2: Prefer Predefined Roles Over Basic Roles
If an answer uses Editor or Owner roles, it is almost always wrong in the context of data security. The exam expects you to know and select the appropriate predefined role for each service.
Tip 3: Know the Difference Between IAM and Organization Policies
If the question is about preventing a configuration across the organization (e.g., restricting resource locations, enforcing uniform bucket access), the answer involves Organization Policies. If the question is about granting or restricting access for specific users or service accounts, the answer involves IAM.
Tip 4: Remember That Organization Policies Override IAM
Even if a user has the IAM permission to create a GCS bucket in any region, an organization policy restricting resource locations to the EU will prevent them from creating a bucket in US regions.
Tip 5: VPC Service Controls for Data Exfiltration Prevention
When a question mentions preventing data exfiltration, data leakage, or ensuring data stays within organizational boundaries, think VPC Service Controls. IAM alone cannot prevent a user with legitimate read access from copying data to an external project.
Tip 6: Service Accounts Are Key for Pipeline Security
For data pipeline questions, focus on the service account used by the pipeline. Ensure it has the right roles on all source and sink resources. The exam frequently tests whether you understand that a Dataflow or Dataproc job's service account needs permissions in every project it interacts with.
Tip 7: Avoid Service Account Keys When Possible
If one answer involves downloading a service account key and another uses attached service accounts, Workload Identity, or impersonation, choose the keyless option.
Tip 8: Column-Level and Row-Level Security in BigQuery
For questions about restricting access to specific columns or rows within BigQuery tables, know that column-level security uses Data Catalog policy tags and row-level security uses row access policies. Authorized views are also used for controlled data sharing but are applied at the view level rather than individual columns or rows.
Tip 9: Understand Policy Inheritance
IAM policies are additive and inherited down the hierarchy. A permission granted at the organization level cannot be revoked at the project level (unless using deny policies). If a question asks how to remove excessive access, consider restructuring role bindings, using deny policies, or applying access at a narrower scope.
Tip 10: Watch for Separation of Duties
Some questions test whether you understand that the person who manages the pipeline should not necessarily have access to the data. For example, a roles/dataflow.developer can manage Dataflow jobs but doesn't automatically get access to the data in BigQuery or GCS — that requires separate IAM bindings.
Tip 11: Domain Restricted Sharing Is an Org Policy, Not IAM
If a question asks how to prevent sharing resources with external email addresses, the answer is the constraints/iam.allowedPolicyMemberDomains organization policy — not an IAM role modification.
Tip 12: Read Questions Carefully for Scope
Determine whether the question asks about a single project, a department (folder), or the entire organization. This determines at which level in the hierarchy you should apply the IAM policy or organization constraint.
Tip 13: Know Key Predefined Roles by Heart
Memorize the most common data-related predefined roles for BigQuery, Cloud Storage, Pub/Sub, Dataflow, Dataproc, and Cloud Composer. The exam expects you to distinguish between viewer, editor, admin, and specialized roles like bigquery.jobUser or storage.objectCreator.
Tip 14: Audit and Monitoring
For questions about detecting unauthorized access or reviewing who accessed what data, think Cloud Audit Logs (Admin Activity, Data Access logs), IAM Recommender, and Policy Analyzer. Enabling Data Access audit logs for BigQuery and GCS provides visibility into who read or modified data.
Tip 15: Encryption and CMEK Policies
When a question mentions regulatory requirements for encryption key management, think CMEK (Customer-Managed Encryption Keys) and the corresponding organization policy that can enforce CMEK usage across services. Google-managed encryption is the default but doesn't meet all compliance requirements.
Summary
IAM and Organization Policies are the backbone of data security in GCP. For the Professional Data Engineer exam, you must demonstrate mastery of:
• Designing least-privilege access for complex data pipelines spanning multiple services and projects
• Choosing the correct predefined roles for each GCP data service
• Using Organization Policies to enforce organizational constraints like data residency and domain-restricted sharing
• Implementing VPC Service Controls to prevent data exfiltration
• Applying fine-grained data access controls (column-level, row-level) in BigQuery
• Securing service accounts and avoiding key-based authentication
• Understanding how IAM, Organization Policies, VPC Service Controls, and data-level security work together as defense-in-depth layers
Mastering these concepts will help you answer a significant portion of the security-related questions on the exam confidently and correctly.
Unlock Premium Access
Google Cloud Professional Data Engineer + ALL Certifications
- Access to ALL Certifications: Study for any certification on our platform with one subscription
- 3105 Superior-grade Google Cloud Professional Data Engineer practice questions
- Unlimited practice tests across all certifications
- Detailed explanations for every question
- GCP Data Engineer: 5 full exams plus all other certification exams
- 100% Satisfaction Guaranteed: Full refund if unsatisfied
- Risk-Free: 7-day free trial with all premium features!