Learn Identify Considerations for Relational Data on Azure (DP-900) with Interactive Flashcards
Master key concepts in Identify Considerations for Relational Data on Azure through our interactive flashcard system. Click on each card to reveal detailed explanations and enhance your understanding.
Features of Relational Data
Relational data is one of the most common and widely used data models, organized into structured tables consisting of rows and columns. Here are the key features of relational data:
**1. Tables (Relations):** Data is stored in tables, where each table represents a specific entity such as customers, orders, or products. Each table has a defined structure known as a schema.
**2. Rows and Columns:** Each row (also called a record or tuple) represents a single instance of an entity, while each column (also called an attribute or field) represents a specific property of that entity. For example, a 'Customer' table may have columns like CustomerID, Name, and Email.
**3. Primary Keys:** Every table has a primary key, which is a unique identifier for each row. This ensures that no two rows in the same table have identical values for the primary key, maintaining data integrity.
**4. Foreign Keys:** Relationships between tables are established through foreign keys. A foreign key in one table references the primary key of another table, enabling connections between related data across multiple tables.
**5. Normalization:** Relational data follows normalization principles to minimize data redundancy and dependency. Data is divided into multiple related tables to avoid duplication and ensure consistency.
**6. ACID Compliance:** Relational databases support ACID properties (Atomicity, Consistency, Isolation, Durability), ensuring reliable transaction processing and data integrity even during system failures.
**7. SQL (Structured Query Language):** Relational data is queried and manipulated using SQL, a standardized language for creating, reading, updating, and deleting data (CRUD operations).
**8. Referential Integrity:** The database enforces rules to ensure that relationships between tables remain consistent. For instance, you cannot reference a non-existent primary key through a foreign key.
**9. Scalability and Indexing:** Relational databases support indexing for faster query performance and can scale to handle large volumes of structured data efficiently.
These features make relational data ideal for transactional systems, business applications, and scenarios requiring structured, consistent data management.
Primary Keys, Foreign Keys, and Relationships
In relational databases on Azure, Primary Keys, Foreign Keys, and Relationships are fundamental concepts that ensure data integrity and enable efficient data organization.
**Primary Keys** uniquely identify each record in a table. A primary key must contain unique, non-null values, ensuring that no two rows share the same identifier. For example, in a 'Customers' table, a 'CustomerID' column can serve as the primary key. Azure SQL Database and other Azure relational services enforce primary key constraints to maintain data uniqueness and integrity. Primary keys can be a single column or a combination of columns (composite key).
**Foreign Keys** establish a link between two tables by referencing the primary key of another table. A foreign key column in one table points to the primary key column in a related table, creating a parent-child relationship. For instance, an 'Orders' table may have a 'CustomerID' foreign key that references the 'CustomerID' primary key in the 'Customers' table. This ensures referential integrity, meaning you cannot insert an order for a customer that does not exist in the Customers table.
**Relationships** define how tables are connected through primary and foreign keys. There are three main types:
1. **One-to-Many**: The most common relationship where one record in a table relates to multiple records in another table (e.g., one customer can have many orders).
2. **One-to-One**: A single record in one table corresponds to exactly one record in another table.
3. **Many-to-Many**: Multiple records in one table relate to multiple records in another, typically implemented using a junction (bridge) table containing foreign keys from both tables.
In Azure relational services like Azure SQL Database, Azure Database for MySQL, and Azure Database for PostgreSQL, these concepts are critical for designing normalized, efficient schemas that reduce data redundancy, maintain consistency, and support complex queries through JOIN operations across related tables.
Database Normalization Concepts
Database normalization is a fundamental concept in relational database design that organizes data to reduce redundancy and improve data integrity. It involves structuring a database according to a series of progressive rules called normal forms.
**First Normal Form (1NF):** Each table cell must contain a single value, and each record must be unique. This eliminates repeating groups and ensures that every column holds atomic (indivisible) values. For example, instead of storing multiple phone numbers in one column, each phone number gets its own row or separate table.
**Second Normal Form (2NF):** Building on 1NF, all non-key columns must depend on the entire primary key. This applies particularly to tables with composite keys. Any column that depends on only part of the key should be moved to a separate table.
**Third Normal Form (3NF):** Building on 2NF, all columns must depend directly on the primary key, not on other non-key columns. This removes transitive dependencies. For instance, if a table stores customer ID, city, and zip code, and city depends on zip code rather than customer ID, city should be moved to a separate zip code table.
**Benefits of Normalization:**
- Minimizes data redundancy and storage waste
- Ensures data consistency and integrity
- Makes updates, insertions, and deletions more efficient
- Reduces the risk of data anomalies
**Trade-offs in Azure Context:**
While normalization is ideal for transactional databases like Azure SQL Database and Azure Database for PostgreSQL, it can introduce complexity through multiple table joins, potentially impacting query performance. In Azure, denormalization (intentionally introducing some redundancy) is sometimes used in analytical workloads like Azure Synapse Analytics to optimize read-heavy operations.
Understanding normalization is essential when designing relational databases on Azure, as it directly impacts performance, scalability, and data quality across services like Azure SQL Database, Azure SQL Managed Instance, and other Azure relational data offerings.
SQL Data Definition Language Statements
SQL Data Definition Language (DDL) Statements are a subset of SQL commands used to define, modify, and manage the structure of database objects in relational databases, including those hosted on Azure such as Azure SQL Database, Azure SQL Managed Instance, and Azure Database for PostgreSQL/MySQL.
The primary DDL statements include:
1. **CREATE**: Used to create new database objects such as tables, views, indexes, schemas, and stored procedures. For example, 'CREATE TABLE Employees (EmployeeID INT PRIMARY KEY, Name VARCHAR(100), Department VARCHAR(50))' creates a new table with specified columns and data types.
2. **ALTER**: Used to modify the structure of existing database objects. You can add, remove, or modify columns in a table, change data types, or add constraints. For example, 'ALTER TABLE Employees ADD Salary DECIMAL(10,2)' adds a new column to an existing table.
3. **DROP**: Used to delete database objects entirely. For example, 'DROP TABLE Employees' removes the table and all its data permanently from the database.
4. **RENAME**: Used to rename existing database objects such as tables or columns.
Key considerations for DDL in Azure relational databases include:
- **Schema Design**: DDL statements help establish the schema, which defines the logical structure including tables, relationships, primary keys, and foreign keys that enforce referential integrity.
- **Data Types**: When creating tables, you must specify appropriate data types (INT, VARCHAR, DATETIME, DECIMAL, etc.) for each column to ensure data integrity and optimize storage.
- **Constraints**: DDL allows defining constraints like PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE, and CHECK to enforce business rules at the database level.
- **Permissions**: Executing DDL statements typically requires elevated permissions, as these operations affect the database structure.
DDL statements are critical for database administrators and developers working with Azure relational data services, as they form the foundation upon which data is stored, organized, and maintained throughout the lifecycle of an application.
SQL Data Manipulation Language Statements
SQL Data Manipulation Language (DML) statements are a subset of SQL commands used to query, insert, modify, and delete data within relational database tables in Azure and other database systems. DML is essential for managing the data stored in relational databases, and there are four primary DML statements:
1. **SELECT**: The most commonly used DML statement, SELECT retrieves data from one or more tables. It allows you to specify columns, filter rows using WHERE clauses, sort results with ORDER BY, group data using GROUP BY, and join multiple tables together. For example: SELECT Name, Age FROM Employees WHERE Department = 'Sales'.
2. **INSERT**: This statement adds new rows of data into a table. You specify the target table, the columns, and the corresponding values. For example: INSERT INTO Employees (Name, Age, Department) VALUES ('John', 30, 'Marketing'). You can insert single or multiple rows at once.
3. **UPDATE**: Used to modify existing data in a table, UPDATE changes column values for rows that match a specified condition. It is critical to use a WHERE clause to avoid unintentionally updating all rows. For example: UPDATE Employees SET Department = 'Finance' WHERE Name = 'John'.
4. **DELETE**: This statement removes rows from a table based on a condition. Like UPDATE, a WHERE clause should be used carefully to target specific rows. For example: DELETE FROM Employees WHERE Age > 65.
In Azure, DML statements work consistently across services like Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. These statements follow the standard T-SQL (Transact-SQL) syntax used in Microsoft SQL Server.
DML statements operate within transactions, meaning changes can be committed or rolled back to maintain data integrity. Understanding DML is fundamental for anyone working with relational data on Azure, as it forms the core mechanism for interacting with and managing data stored in relational database systems.
SQL Query Filtering, Sorting, and Aggregation
SQL Query Filtering, Sorting, and Aggregation are fundamental operations used to retrieve and manipulate data in relational databases on Azure, such as Azure SQL Database, Azure Database for MySQL, and Azure Database for PostgreSQL.
**Filtering** is performed using the WHERE clause, which allows you to narrow down result sets by specifying conditions. For example, 'SELECT * FROM Products WHERE Price > 50' retrieves only products with a price greater than 50. You can combine multiple conditions using logical operators like AND, OR, and NOT. Additional filtering keywords include LIKE for pattern matching, IN for matching against a list of values, and BETWEEN for range-based filtering. The HAVING clause is used specifically to filter results after aggregation has been applied.
**Sorting** is accomplished using the ORDER BY clause, which arranges the result set in ascending (ASC) or descending (DESC) order based on one or more columns. For example, 'SELECT * FROM Employees ORDER BY LastName ASC, HireDate DESC' sorts employees alphabetically by last name and then by hire date in reverse order. By default, ORDER BY sorts in ascending order if no direction is specified.
**Aggregation** involves using aggregate functions to perform calculations across multiple rows and return a single summary value. Common aggregate functions include COUNT (number of rows), SUM (total of numeric values), AVG (average), MIN (minimum value), and MAX (maximum value). The GROUP BY clause is essential for aggregation, as it groups rows sharing common values so aggregate functions can be applied per group. For example, 'SELECT Category, COUNT(*) AS TotalProducts FROM Products GROUP BY Category' returns the number of products in each category.
These three operations are often combined in a single query to extract meaningful insights from Azure relational databases. Together, they form the backbone of data analysis in SQL, enabling users to efficiently query large datasets, identify trends, and generate reports within Azure's cloud-based relational data services.
SQL Joins and Subqueries
SQL Joins and Subqueries are fundamental concepts in relational databases on Azure, enabling users to retrieve and combine data from multiple tables efficiently.
**SQL Joins** allow you to combine rows from two or more tables based on a related column between them. There are several types of joins:
1. **INNER JOIN**: Returns only the rows where there is a matching value in both tables. This is the most commonly used join.
2. **LEFT (OUTER) JOIN**: Returns all rows from the left table and the matched rows from the right table. If no match exists, NULL values are returned for the right table's columns.
3. **RIGHT (OUTER) JOIN**: Returns all rows from the right table and matched rows from the left table, with NULLs where no match exists.
4. **FULL (OUTER) JOIN**: Returns all rows from both tables, filling in NULLs where there is no match on either side.
5. **CROSS JOIN**: Returns the Cartesian product of both tables, combining every row from the first table with every row from the second.
6. **SELF JOIN**: A table is joined with itself, useful for hierarchical or comparative queries.
For example, joining a Customers table with an Orders table using INNER JOIN retrieves only customers who have placed orders.
**Subqueries** (also called nested queries) are queries embedded within another SQL query. They can appear in SELECT, FROM, WHERE, or HAVING clauses. Subqueries are useful for filtering results based on aggregated data or conditions from another table. They can be **correlated** (referencing the outer query) or **non-correlated** (independent of the outer query).
For example, a subquery can find all customers whose order total exceeds the average: SELECT * FROM Customers WHERE CustomerID IN (SELECT CustomerID FROM Orders WHERE Amount > (SELECT AVG(Amount) FROM Orders)).
In Azure SQL Database, Azure Synapse Analytics, and other Azure relational services, both joins and subqueries are fully supported, enabling powerful data retrieval and analysis across normalized relational schemas.
Common Database Objects
Common Database Objects in Azure relational databases are fundamental structures used to store, manage, and manipulate data efficiently. Here are the key objects:
**1. Tables:** Tables are the core database objects that store data in rows and columns. Each row represents a unique record, and each column represents a field with a specific data type. Tables enforce structure through schemas, ensuring data consistency and integrity.
**2. Views:** Views are virtual tables based on the result of a SQL query. They don't store data themselves but provide a way to simplify complex queries, restrict access to specific data, and present data in a customized format. Views enhance security by exposing only necessary columns or rows to users.
**3. Indexes:** Indexes improve the speed of data retrieval operations on tables. They work similarly to a book's index, allowing the database engine to find data without scanning every row. While indexes boost read performance, they can slightly slow down write operations due to the overhead of maintaining the index structure.
**4. Stored Procedures:** Stored procedures are precompiled collections of SQL statements stored in the database. They encapsulate business logic, promote code reuse, improve performance through precompilation, and enhance security by controlling data access through parameterized operations.
**5. Functions:** Functions are reusable blocks of SQL code that accept parameters, perform calculations or operations, and return a value or a table. They help simplify complex computations and can be used within queries.
**6. Triggers:** Triggers are special stored procedures that automatically execute in response to specific events on a table, such as INSERT, UPDATE, or DELETE operations. They are commonly used to enforce business rules, maintain audit trails, and ensure data integrity.
These database objects work together to create a robust, efficient, and secure relational database system in Azure, supporting services like Azure SQL Database, Azure Database for MySQL, and Azure Database for PostgreSQL.
Azure SQL Database
Azure SQL Database is a fully managed Platform-as-a-Service (PaaS) database engine hosted in Microsoft Azure's cloud environment. It is built on the latest stable version of the Microsoft SQL Server database engine, offering a reliable and high-performance relational database solution without the need for managing underlying infrastructure.
Key features of Azure SQL Database include:
1. **Fully Managed Service**: Microsoft handles all database management tasks such as patching, backups, monitoring, and updates, allowing users to focus on application development rather than infrastructure maintenance.
2. **Deployment Options**: Azure SQL Database offers two main deployment models — Single Database (an isolated database with its own dedicated resources) and Elastic Pool (multiple databases sharing a common pool of resources, ideal for cost optimization).
3. **Scalability**: It supports dynamic scalability, allowing users to scale compute and storage resources up or down based on workload demands. This includes options like DTU-based (Database Transaction Unit) and vCore-based purchasing models.
4. **High Availability**: Azure SQL Database provides built-in high availability with a 99.99% uptime SLA, leveraging redundant infrastructure and automatic failover mechanisms.
5. **Security**: It includes advanced security features such as Transparent Data Encryption (TDE), Advanced Threat Protection, auditing, data masking, and Azure Active Directory integration for authentication.
6. **Intelligent Performance**: Features like automatic tuning, query performance insights, and intelligent recommendations help optimize database performance over time.
7. **Backup and Recovery**: Automated backups with point-in-time restore capabilities ensure data protection, with configurable retention periods and geo-redundant backup storage.
Azure SQL Database is ideal for modern cloud applications, SaaS solutions, and scenarios requiring a low-maintenance, scalable relational database. It supports T-SQL, making migration from on-premises SQL Server relatively seamless. It is a strong choice for organizations looking to leverage the power of SQL Server without the overhead of managing physical or virtual infrastructure.
Azure SQL Managed Instance
Azure SQL Managed Instance is a fully managed Platform-as-a-Service (PaaS) database engine hosted in the Azure cloud that provides near 100% compatibility with the latest SQL Server (Enterprise Edition) database engine. It is designed to facilitate easy migration of on-premises SQL Server databases to the cloud with minimal changes.
Key features and considerations include:
1. **Near-Complete SQL Server Compatibility**: Unlike Azure SQL Database, Managed Instance supports features such as cross-database queries, SQL Server Agent, Common Language Runtime (CLR), Database Mail, linked servers, and distributed transactions, making it ideal for lift-and-shift migrations.
2. **Fully Managed Service**: Microsoft handles patching, backups, monitoring, and high availability, reducing the administrative burden on database administrators while still providing granular control over configurations.
3. **VNet Integration**: Managed Instance is deployed within an Azure Virtual Network (VNet), providing network isolation and secure connectivity. This ensures that the instance is not exposed to the public internet by default, enhancing security.
4. **High Availability**: It comes with built-in high availability through Always On Availability Groups technology, ensuring business continuity with automated failover capabilities.
5. **Backup and Restore**: Automated backups are included with configurable retention periods, and users can perform point-in-time restores. Native backup and restore from Azure Blob Storage is also supported.
6. **Pricing Tiers**: It offers General Purpose and Business Critical service tiers, allowing customers to choose based on performance, storage, and availability requirements.
7. **Migration Path**: Azure Database Migration Service (DMS) can be used to seamlessly migrate on-premises databases to Managed Instance with minimal downtime.
8. **Scalability**: Users can scale compute and storage resources independently based on workload demands.
Azure SQL Managed Instance bridges the gap between on-premises SQL Server and Azure SQL Database, making it the best choice for organizations that need full SQL Server compatibility in a managed cloud environment while minimizing migration effort and application changes.
SQL Server on Azure Virtual Machines
SQL Server on Azure Virtual Machines (VMs) is an Infrastructure-as-a-Service (IaaS) offering that allows you to run fully licensed SQL Server instances on cloud-hosted virtual machines in Microsoft Azure. This approach is often referred to as a 'lift-and-shift' solution because it enables organizations to migrate their existing on-premises SQL Server workloads to the cloud with minimal changes to applications and database configurations.
With SQL Server on Azure VMs, you get full control over the SQL Server instance and the underlying operating system. You can choose the version and edition of SQL Server you need, including SQL Server 2012, 2014, 2016, 2017, 2019, and 2022. This makes it ideal for scenarios requiring OS-level access, specific SQL Server features not available in managed services, or legacy application compatibility.
Key benefits include:
1. **Full Administrative Control**: You manage everything from the operating system to SQL Server configurations, including patching, backups, and high availability settings.
2. **Flexibility**: You can select VM sizes based on your workload requirements, scaling compute, memory, and storage independently.
3. **Compatibility**: It supports all SQL Server features, making it suitable for applications that depend on features unavailable in Azure SQL Database or Azure SQL Managed Instance.
4. **Hybrid Connectivity**: It integrates with on-premises networks through Azure Virtual Networks and VPN gateways.
5. **Cost Options**: You can use pay-as-you-go pricing or leverage existing SQL Server licenses through Azure Hybrid Benefit to reduce costs.
However, since it is an IaaS solution, the customer is responsible for managing and maintaining the VM, including OS updates, security patches, and database administration tasks. This contrasts with PaaS offerings like Azure SQL Database, where Microsoft handles much of the infrastructure management.
SQL Server on Azure VMs is best suited for migrations requiring minimal code changes, applications needing specific SQL Server versions, and workloads requiring full SQL Server functionality with granular infrastructure control.
Choosing Among Azure SQL Family Products
Choosing among Azure SQL family products requires understanding the key offerings and their ideal use cases. The Azure SQL family includes three primary products: Azure SQL Database, Azure SQL Managed Instance, and SQL Server on Azure Virtual Machines.
**Azure SQL Database** is a fully managed Platform-as-a-Service (PaaS) database engine that handles most database management functions such as upgrading, patching, backups, and monitoring without user involvement. It is best suited for modern cloud-born applications that need the latest stable SQL Server features. It offers single database and elastic pool deployment options. Elastic pools allow multiple databases to share resources, making it cost-effective for databases with variable usage patterns. It provides built-in high availability, intelligence, and management.
**Azure SQL Managed Instance** is also a PaaS offering but provides near 100% compatibility with the on-premises SQL Server Enterprise Edition engine. It is ideal for organizations looking to migrate on-premises SQL Server workloads to the cloud with minimal changes. It supports features like SQL Server Agent, cross-database queries, Service Broker, and CLR integration that are not available in Azure SQL Database. This makes it the best lift-and-shift migration option.
**SQL Server on Azure Virtual Machines** is an Infrastructure-as-a-Service (IaaS) option that provides full control over the SQL Server instance and the underlying operating system. It is ideal when you need OS-level access, require specific SQL Server versions, or need features not supported by the managed options. It suits applications requiring customization beyond what PaaS offerings allow.
**Key considerations when choosing** include: the level of administrative control needed, compatibility requirements with existing on-premises SQL Server, budget constraints, high availability needs, and the degree of cloud-native features desired. PaaS options reduce management overhead, while IaaS provides maximum flexibility. Organizations should evaluate their migration complexity, long-term scalability, and operational preferences to select the most appropriate Azure SQL product.
Azure Database for MySQL
Azure Database for MySQL is a fully managed relational database service provided by Microsoft Azure, based on the MySQL Community Edition database engine. It enables organizations to host MySQL databases in the cloud without the need to manage underlying infrastructure, patching, or backups manually.
Key features of Azure Database for MySQL include:
1. **Fully Managed Service**: Azure handles routine database maintenance tasks such as patching, automated backups, and monitoring, allowing developers and administrators to focus on application development rather than infrastructure management.
2. **High Availability**: The service offers built-in high availability with no additional configuration required. It provides a financially backed SLA of up to 99.99% uptime, ensuring business continuity and minimal downtime.
3. **Scalability**: Azure Database for MySQL supports flexible scaling of compute and storage resources independently. Users can scale up or down based on workload demands, paying only for the resources they consume.
4. **Security**: The service includes advanced security features such as data encryption at rest and in transit, Azure Active Directory authentication, firewall rules, and Virtual Network service endpoints to protect sensitive data.
5. **Automated Backups**: It provides automatic backups with point-in-time restore capabilities for up to 35 days, ensuring data protection and disaster recovery.
6. **Deployment Options**: Azure offers two deployment modes — Single Server and Flexible Server. The Flexible Server option provides more granular control over database configurations, maintenance windows, and cost optimization.
7. **Compatibility**: Since it is based on MySQL Community Edition, existing MySQL applications can be migrated to Azure with minimal code changes, supporting common MySQL tools and frameworks.
8. **Monitoring and Diagnostics**: Integration with Azure Monitor and diagnostic logs allows users to track performance metrics, set alerts, and troubleshoot issues efficiently.
Azure Database for MySQL is ideal for web applications, e-commerce platforms, and content management systems that rely on MySQL, offering a reliable, secure, and cost-effective cloud-based database solution.
Azure Database for PostgreSQL
Azure Database for PostgreSQL is a fully managed relational database service provided by Microsoft Azure, based on the open-source PostgreSQL database engine. It allows organizations to leverage the power of PostgreSQL without the complexity of managing infrastructure, patching, backups, or security configurations manually.
Azure Database for PostgreSQL is available in multiple deployment options:
1. **Single Server**: A fully managed database service with minimal customization requirements. It offers built-in high availability, automated backups with point-in-time restore for up to 35 days, and predictable performance. This option is suitable for applications that don't require extensive database customization.
2. **Flexible Server**: This is the next-generation deployment option that provides more granular control over database management and configurations. It offers better cost optimization with the ability to stop and start the server, burstable compute tiers, and zone-redundant high availability. Flexible Server is ideal for production workloads requiring fine-tuned control.
3. **Hyperscale (Citus)**: This option horizontally scales queries across multiple machines using sharding. It is designed for applications that require greater scale and performance, particularly for multi-tenant applications and real-time analytical workloads.
Key features of Azure Database for PostgreSQL include:
- **High Availability**: Built-in HA with up to 99.99% uptime SLA.
- **Security**: Data encryption at rest and in transit, Azure Active Directory authentication, and firewall rules.
- **Automatic Backups**: Automated backups with configurable retention periods.
- **Scalability**: Easy scaling of compute and storage resources independently.
- **Monitoring**: Integration with Azure Monitor and diagnostic logging for performance insights.
- **Community Compatibility**: Supports popular PostgreSQL extensions and tools, ensuring compatibility with existing applications.
Azure Database for PostgreSQL is ideal for developers and organizations already familiar with PostgreSQL who want a cloud-native, scalable, and secure database solution without the overhead of infrastructure management, making it a strong choice for modern application development on Azure.