Azure Cosmos DB APIs for Cassandra, Table, and Gremlin
Azure Cosmos DB is a globally distributed, multi-model database service that supports multiple APIs, allowing developers to interact with data using familiar interfaces. **Cassandra API:** The Cassandra API in Azure Cosmos DB enables developers who are already familiar with Apache Cassandra to use… Azure Cosmos DB is a globally distributed, multi-model database service that supports multiple APIs, allowing developers to interact with data using familiar interfaces. **Cassandra API:** The Cassandra API in Azure Cosmos DB enables developers who are already familiar with Apache Cassandra to use Cosmos DB as the underlying data store with minimal code changes. It supports the Cassandra Query Language (CQL), Cassandra drivers, and tools, making migration seamless. Data is stored in a column-family format, which is ideal for handling large volumes of data across distributed systems. This API is particularly useful for applications requiring high write throughput, flexible schemas, and wide-column storage. It provides the benefits of Cosmos DB such as global distribution, elastic scalability, and guaranteed low latency while maintaining Cassandra compatibility. **Table API:** The Table API offers a key-value storage model similar to Azure Table Storage but with significant enhancements. It provides premium capabilities including global distribution, automatic indexing, dedicated throughput, and single-digit millisecond latency. Applications already using Azure Table Storage can migrate to Cosmos DB Table API with minimal changes and benefit from higher SLAs and performance guarantees. Data is organized in tables with rows identified by partition keys and row keys. This API is ideal for applications that need simple key-value lookups, semi-structured data storage, and don't require complex querying or relationships. **Gremlin API:** The Gremlin API supports graph database functionality, allowing you to model, store, and query data as graphs consisting of vertices (nodes) and edges (relationships). It uses the Apache TinkerPop Gremlin traversal language for querying graph structures. This API is perfect for scenarios involving complex relationships such as social networks, recommendation engines, fraud detection, and knowledge graphs. It enables efficient traversal of deeply connected datasets where relationships between entities are as important as the entities themselves. All three APIs benefit from Cosmos DB's core features: turnkey global distribution, elastic scalability, comprehensive SLAs, and automatic indexing.
Azure Cosmos DB APIs: Cassandra, Table, and Gremlin – A Complete Guide for DP-900
Introduction
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. One of its most powerful features is the ability to interact with data using multiple APIs, allowing developers to use familiar interfaces while leveraging Cosmos DB's underlying capabilities such as global distribution, elastic scalability, and guaranteed low latency. Beyond the commonly discussed SQL (Core) API and MongoDB API, Cosmos DB also supports the Cassandra API, Table API, and Gremlin API. Understanding these three APIs is essential for the DP-900 (Azure Data Fundamentals) exam.
Why Are These APIs Important?
These APIs are important because they allow organizations to:
• Migrate existing workloads from Apache Cassandra, Azure Table Storage, or Apache TinkerPop (Gremlin) graph databases to Cosmos DB with minimal code changes.
• Leverage Cosmos DB's enterprise features — such as turnkey global distribution, automatic indexing, 99.999% availability SLAs, and multiple consistency levels — without abandoning familiar data models and query languages.
• Choose the right data model for the problem at hand: wide-column for Cassandra, key-value for Table, and graph for Gremlin.
• Reduce operational overhead by consolidating different database engines into a single managed service.
For the DP-900 exam, Microsoft expects you to understand when and why each API should be used, and how they differ from one another.
1. Azure Cosmos DB API for Apache Cassandra
What Is It?
The Cassandra API allows you to interact with data stored in Azure Cosmos DB using the Cassandra Query Language (CQL) and Cassandra-compatible drivers and tools. It is wire-protocol compatible with Apache Cassandra, meaning applications that already use Cassandra can often connect to Cosmos DB with only a connection string change.
Data Model:
Cassandra uses a wide-column store model. Data is organized into:
• Keyspaces (similar to databases)
• Tables (collections of rows)
• Rows that can have a flexible number of columns
• Each row is identified by a partition key and optional clustering columns
How It Works:
• You create a Cosmos DB account and select the Cassandra API during provisioning.
• You use CQL (Cassandra Query Language) to create keyspaces, tables, and perform CRUD operations.
• Existing Cassandra SDKs and drivers (e.g., DataStax drivers for Java, Python, .NET) can connect to the Cosmos DB Cassandra endpoint.
• Cosmos DB maps Cassandra's partitioning model to its own internal partitioning scheme, providing automatic distribution and scaling.
• Throughput is provisioned in Request Units (RUs), just like all Cosmos DB APIs.
When to Use It:
• You have an existing Apache Cassandra workload and want to migrate to a fully managed service.
• You need a wide-column data model for high-throughput, write-heavy scenarios such as IoT telemetry, time-series data, or event logging.
• Your development team is already proficient in CQL.
Key Differentiators from Native Cassandra:
• Fully managed — no need to manage clusters, nodes, or patches.
• Guaranteed low-latency reads and writes backed by SLAs.
• Turnkey global distribution with multi-region writes.
• Multiple consistency levels (Cassandra natively offers only eventual and strong).
2. Azure Cosmos DB API for Table
What Is It?
The Table API provides a premium alternative to Azure Table Storage. It allows you to store key-value data in a schema-less table format and access it using the same Azure Table Storage SDKs and REST APIs, but with the added benefits of Cosmos DB's global distribution, indexing, and performance guarantees.
Data Model:
The Table API uses a key-value / key-attribute model:
• Data is stored in tables.
• Each entity (row) is uniquely identified by a combination of PartitionKey and RowKey.
• Entities can have up to 255 properties (columns), and each entity in the same table can have different properties (schema-less).
How It Works:
• You create a Cosmos DB account and select the Table API.
• You use the Azure Tables SDK (or the older Azure Storage SDK) to perform operations like insert, update, delete, and query on table entities.
• Applications currently using Azure Table Storage can migrate by simply changing the connection string to point to the Cosmos DB Table API endpoint (in most cases).
• Cosmos DB automatically indexes all properties, unlike Azure Table Storage which only indexes PartitionKey and RowKey.
When to Use It:
• You have an existing Azure Table Storage application and need better performance, global distribution, or richer SLAs.
• You need a simple key-value store with flexible schemas and don't require complex querying.
• Scenarios include configuration storage, user profile data, metadata catalogs, and simple lookup tables.
Key Differentiators from Azure Table Storage:
Feature — Azure Table Storage vs. Cosmos DB Table API
• Latency: Variable vs. Single-digit millisecond (backed by SLA)
• Throughput: Max 20,000 ops/sec per table vs. Unlimited (elastically scalable with RUs)
• Global Distribution: Single region (with GRS replication) vs. Turnkey multi-region distribution
• Indexing: Only on PartitionKey and RowKey vs. Automatic indexing on all properties
• Consistency: Strong or Eventual vs. Five well-defined consistency levels
• Cost: Lower cost for basic workloads vs. Higher cost but with premium features
3. Azure Cosmos DB API for Apache Gremlin (Graph)
What Is It?
The Gremlin API enables you to store and query graph data using the Apache TinkerPop Gremlin traversal language. It is designed for scenarios where relationships between entities are as important as the entities themselves.
Data Model:
Graph databases use a property graph model consisting of:
• Vertices (nodes): Represent entities such as people, products, or locations. Each vertex has a label, an ID, and properties.
• Edges (relationships): Represent connections between vertices. Each edge has a label, direction, and properties.
• Properties: Key-value pairs attached to vertices or edges that store additional information.
For example, in a social network: a Person vertex might have properties like name and age, and a FRIENDS_WITH edge would connect two Person vertices.
How It Works:
• You create a Cosmos DB account and select the Gremlin (Graph) API.
• Data is organized into databases and graphs (containers).
• You use the Gremlin traversal language to add vertices, add edges, and traverse the graph. Example: g.V().hasLabel('person').has('name','Alice').out('FRIENDS_WITH') finds all people Alice is friends with.
• Gremlin-compatible SDKs (Java, .NET, Python, Node.js) and tools like the Gremlin Console or the Azure portal's Data Explorer can be used.
• Cosmos DB partitions the graph data for scalability. You define a partition key on the graph container.
When to Use It:
• Modeling and querying complex, highly connected data — social networks, recommendation engines, fraud detection, knowledge graphs, network topologies.
• When you need to perform traversal queries (e.g., "find all friends of friends" or "find the shortest path between two nodes").
• When relationships are a first-class concept in your data model and you need to query them efficiently.
Key Points:
• Graph databases excel at relationship-heavy queries that would require multiple expensive JOINs in relational databases.
• Gremlin is a traversal language, not a query language like SQL — you describe a path through the graph.
• Vertices and edges are both stored as JSON documents internally in Cosmos DB.
Comparing the Three APIs at a Glance
Cassandra API:
• Data Model: Wide-column
• Query Language: CQL (Cassandra Query Language)
• Compatible With: Apache Cassandra clients and tools
• Best For: High-throughput writes, time-series, IoT, migration from Cassandra
Table API:
• Data Model: Key-value (key-attribute)
• Query Language: OData / Azure Tables SDK
• Compatible With: Azure Table Storage SDKs
• Best For: Simple lookups, configuration data, migration from Azure Table Storage
Gremlin API:
• Data Model: Property graph (vertices and edges)
• Query Language: Gremlin traversal language
• Compatible With: Apache TinkerPop-compatible tools
• Best For: Relationship-rich data, social networks, recommendation engines, fraud detection
Common Concepts Across All APIs
Regardless of which API you choose, the following Cosmos DB fundamentals apply:
• Request Units (RUs): Throughput is measured and provisioned in RUs. This is the currency of Cosmos DB.
• Global Distribution: Data can be replicated to any number of Azure regions with a single click.
• Partitioning: Data is automatically partitioned for scalability. Choosing a good partition key is crucial.
• Consistency Levels: Five levels — Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual.
• SLAs: Cosmos DB provides SLAs on availability, throughput, latency, and consistency.
• Automatic Indexing: All data is automatically indexed by default.
• API is chosen at account creation time: You cannot switch APIs after the account is created. Each Cosmos DB account is associated with one API.
Exam Tips: Answering Questions on Azure Cosmos DB APIs for Cassandra, Table, and Gremlin
Here are targeted tips for the DP-900 exam:
Tip 1: Know the data model for each API.
The exam will often describe a scenario and ask you to pick the correct API. Remember: Cassandra = wide-column, Table = key-value, Gremlin = graph (vertices and edges). If the question mentions relationships, connections, traversals, or social networks, the answer is Gremlin. If it mentions migrating from Azure Table Storage or simple key-value lookups, the answer is Table. If it mentions CQL, time-series data, or migrating from Cassandra, the answer is Cassandra.
Tip 2: Understand migration scenarios.
A common question pattern is: "An organization currently uses Apache Cassandra / Azure Table Storage and wants to move to a managed service with global distribution." The answer maps directly — Cassandra API for Cassandra workloads, Table API for Table Storage workloads. The key advantage is minimal code changes (wire-protocol or SDK compatibility).
Tip 3: Remember that the API is selected at account creation.
You cannot change the API of an existing Cosmos DB account. If a question implies switching APIs on an existing account, that is incorrect.
Tip 4: Don't confuse Azure Table Storage with Cosmos DB Table API.
They are compatible but different services. Cosmos DB Table API offers better performance, SLAs, global distribution, and automatic indexing on all properties. Azure Table Storage is cheaper but with limited features. If a question asks about premium features, the answer is Cosmos DB Table API.
Tip 5: Gremlin is for graph data — look for keywords.
Watch for keywords like: vertices, edges, nodes, relationships, graph, traversal, connected data, social network, recommendation engine, fraud detection, shortest path. These all point to the Gremlin API.
Tip 6: Know the query language for each API.
• Cassandra API → CQL
• Table API → OData queries via SDK
• Gremlin API → Gremlin traversal language
If a question references a specific query language, match it to the correct API.
Tip 7: All APIs share Cosmos DB's core features.
Questions may try to trick you into thinking one API has global distribution while another doesn't. All APIs benefit from Cosmos DB's global distribution, elastic scalability, multiple consistency levels, automatic indexing, and SLA guarantees.
Tip 8: Understand the partition key concept.
Regardless of API, Cosmos DB uses partition keys for data distribution. For Cassandra, this maps to the Cassandra partition key. For Table, it's the PartitionKey property. For Gremlin, you specify a partition key property on the graph container. Questions may ask about how data is organized or scaled — the answer involves partition keys.
Tip 9: Focus on "which API" scenario questions.
The DP-900 exam is a fundamentals exam. You are more likely to be asked to match a scenario to the correct API than to write Gremlin queries or CQL statements. Focus on understanding when to use each API rather than memorizing syntax.
Tip 10: Remember the five consistency levels.
All Cosmos DB APIs support the same five consistency levels. If a question asks about consistency options for the Cassandra or Gremlin API, the answer is the same: Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual. This is a key advantage over native Cassandra (which only has two) and native Gremlin implementations.
Tip 11: Elimination strategy for tricky questions.
If you're unsure, eliminate options that don't match the data model. For example, if the scenario involves complex many-to-many relationships, eliminate Cassandra and Table (they don't handle graph traversals natively) and choose Gremlin. If the scenario is about flat, simple lookups with PartitionKey/RowKey, eliminate Gremlin and Cassandra and choose Table.
Summary
Azure Cosmos DB's support for Cassandra, Table, and Gremlin APIs demonstrates its versatility as a multi-model database. Each API serves a distinct purpose:
• Cassandra API — for wide-column workloads and CQL-based applications
• Table API — for key-value workloads and Azure Table Storage migrations
• Gremlin API — for graph workloads involving complex relationships
All three APIs inherit Cosmos DB's powerful platform features including global distribution, elastic scaling, automatic indexing, and comprehensive SLAs. For the DP-900 exam, focus on matching scenarios to the right API, understanding the data model each API supports, and knowing the shared underlying capabilities of Azure Cosmos DB.
Unlock Premium Access
Microsoft Azure Data Fundamentals + ALL Certifications
- Access to ALL Certifications: Study for any certification on our platform with one subscription
- 2809 Superior-grade Microsoft Azure Data Fundamentals practice questions
- Unlimited practice tests across all certifications
- Detailed explanations for every question
- DP-900: 5 full exams plus all other certification exams
- 100% Satisfaction Guaranteed: Full refund if unsatisfied
- Risk-Free: 7-day free trial with all premium features!