Back to Domain 2: Fundamentals of Generative AI

Tokens, Embeddings, and Vectors

5 minutes 5 Questions

In the context of Generative AI fundamentals for the AWS AIF-C01 exam, Tokens, Embeddings, and Vectors are core concepts that underpin how large language models (LLMs) process and understand text. **Tokens** are the basic units of text that a model processes. Rather than reading entire sentences, …

Tokens, Embeddings, and Vectors: A Complete Guide for the AIF-C01 Exam

Why Are Tokens, Embeddings, and Vectors Important?

Tokens, embeddings, and vectors are the foundational building blocks that make generative AI and large language models (LLMs) possible. Without these concepts, AI models would have no way to process, understand, or generate human language. Understanding them is critical for the AWS AI Foundations (AIF-C01) exam because they underpin how models like Amazon Bedrock foundation models, GPT-based systems, and other generative AI tools operate internally. AWS expects candidates to understand these core mechanics to make informed decisions about AI services, model selection, and performance optimization.

What Are Tokens?

A token is the smallest unit of text that a language model processes. Rather than reading entire sentences or even whole words, AI models break text down into tokens before processing it.

Key facts about tokens:
- A token can be a whole word (e.g., "cat"), a subword (e.g., "un" + "believe" + "able"), a single character, or even punctuation.
- Tokenization is the process of splitting input text into these smaller units.
- Different models use different tokenization strategies. Common approaches include Byte Pair Encoding (BPE), WordPiece, and SentencePiece.
- On average, one token is roughly 3/4 of a word in English, meaning 100 tokens ≈ 75 words (this is a common approximation).
- The context window of a model is measured in tokens — it defines the maximum number of tokens a model can process in a single input+output sequence.
- Token limits affect both cost (many AI services charge per token) and performance (longer inputs require more compute).

Example:
The sentence "I love machine learning" might be tokenized as: ["I", "love", "machine", "learning"] — 4 tokens. A more complex word like "unbelievable" might become ["un", "believ", "able"] — 3 tokens.

Why tokens matter:
- They determine the input and output limits of a model.
- They directly impact pricing when using services like Amazon Bedrock.
- Understanding tokenization helps explain why models sometimes struggle with rare words, misspellings, or non-English languages.

What Are Embeddings?

An embedding is a numerical representation of a token, word, sentence, or even an entire document in a continuous vector space. Embeddings capture the semantic meaning of text so that the model can understand relationships between words and concepts.

Key facts about embeddings:
- Embeddings convert discrete text data (words/tokens) into continuous numerical data (vectors of floating-point numbers).
- Words with similar meanings are placed close together in the embedding space. For example, "king" and "queen" would have embeddings that are near each other.
- Embeddings capture relationships such as analogies. The classic example: vector("king") - vector("man") + vector("woman") ≈ vector("queen").
- Embeddings are learned during training — the model adjusts these numerical representations as it trains on large datasets.
- Embedding dimensions can vary. A typical embedding might have 768, 1024, or even 4096 dimensions, depending on the model architecture.
- Amazon Bedrock offers embedding models like Amazon Titan Embeddings that can generate embeddings for text and images.
- Embeddings are essential for tasks like semantic search, Retrieval-Augmented Generation (RAG), recommendation systems, and clustering.

Example:
The word "dog" might be represented as a vector like [0.21, -0.45, 0.89, 0.03, ...] with hundreds or thousands of dimensions. The word "puppy" would have a very similar vector, while "airplane" would be far away in this space.

Why embeddings matter:
- They are how AI models "understand" the meaning of language.
- They enable similarity comparisons between texts.
- They are the bridge between raw text and the mathematical operations that neural networks perform.
- They are crucial for RAG architectures, where documents are stored as embeddings in a vector database and retrieved based on semantic similarity to a query.

What Are Vectors?

A vector is an ordered list (array) of numbers that represents a point in a multi-dimensional space. In the context of AI, vectors are the data structure used to store embeddings.

Key facts about vectors:
- Each embedding is stored as a vector — a list of floating-point numbers.
- The number of elements in the vector is called its dimensionality. Higher dimensionality can capture more nuanced meaning but requires more compute and storage.
- Vector similarity is measured using mathematical techniques such as:
  • Cosine similarity — measures the angle between two vectors (most common for text).
  • Euclidean distance — measures the straight-line distance between two vectors.
  • Dot product — measures alignment and magnitude of two vectors.
- Vector databases (such as Amazon OpenSearch Service with vector search, or Pinecone, FAISS, etc.) are specialized databases designed to store and efficiently search through large collections of vectors.
- In RAG architectures, user queries are converted to vectors, and the system searches a vector database to find the most semantically similar document chunks.

Example:
If you embed the query "How do I train a machine learning model?" into a vector and compare it against a database of document embeddings, documents about ML training pipelines would have the highest cosine similarity scores and be returned as the most relevant results.

Why vectors matter:
- They are the underlying data format for all embedding-based operations.
- Vector search enables semantic search capabilities that go beyond keyword matching.
- Understanding vectors helps you grasp how RAG, recommendation systems, and similarity-based applications work.

How Tokens, Embeddings, and Vectors Work Together

Here is the end-to-end flow of how these concepts connect:

1. Tokenization: Raw text input is broken down into tokens.
2. Embedding lookup/generation: Each token is converted into an embedding — a dense numerical vector representation.
3. Processing: The model performs mathematical operations on these vectors (through layers of the neural network, such as transformer attention mechanisms).
4. Output generation: The model produces output vectors that are decoded back into tokens, which are then converted back into human-readable text.

For search/retrieval use cases:
1. Documents are tokenized, embedded, and stored as vectors in a vector database.
2. A user query is tokenized and embedded into a vector.
3. The query vector is compared against stored document vectors using similarity measures.
4. The most similar documents are retrieved and optionally fed to a generative model (RAG pattern).

Key Relationships to Remember for the Exam

- Tokens → Embeddings → Vectors: Text becomes tokens, tokens become embeddings, and embeddings are represented as vectors.
- Context window = token limit: Models can only handle a fixed number of tokens at once.
- Semantic similarity = vector closeness: Similar meanings result in vectors that are close in the embedding space.
- RAG relies on embeddings and vector search: Documents are stored as embeddings in vector databases and retrieved by comparing query embeddings.
- Cost is often token-based: AWS services like Amazon Bedrock charge based on the number of input and output tokens processed.

Exam Tips: Answering Questions on Tokens, Embeddings, and Vectors

1. Know the definitions clearly: The exam may test your ability to distinguish between tokens, embeddings, and vectors. Remember: tokens are text units, embeddings are learned numerical representations, and vectors are the data structure (arrays of numbers) that hold embeddings.

2. Understand token limits and context windows: If a question asks about model limitations regarding input length, the answer relates to the token limit or context window. Longer documents may need to be chunked or summarized to fit within the context window.

3. Connect embeddings to RAG: Questions about how to incorporate external or proprietary data into an LLM's responses will likely involve RAG, which requires embeddings and vector databases. Know that Amazon Titan Embeddings and Amazon OpenSearch Service are relevant AWS services.

4. Cosine similarity is the go-to metric: When the exam asks about measuring how similar two pieces of text are in an embedding space, cosine similarity is typically the best answer.

5. Recognize cost implications: If a question involves reducing costs for a generative AI application, consider strategies that reduce the number of tokens processed — shorter prompts, summarization, or efficient prompt engineering.

6. Don't confuse tokens with words: The exam may try to trick you by equating tokens with words. Remember that tokens can be subwords, characters, or punctuation — one word can be multiple tokens.

7. Link embeddings to semantic understanding: If a question asks how a model understands that "happy" and "joyful" are similar, the answer involves embeddings — these words have similar vector representations in the embedding space.

8. Vector databases are for search, not training: Know the difference — vector databases store precomputed embeddings for retrieval. They are not used to train models; they are used at inference time for similarity search.

9. Watch for questions about dimensionality: Higher-dimensional embeddings capture more detail but are more expensive to store and search. The exam may test trade-offs between embedding quality and computational cost.

10. Remember the AWS-specific services: For the AIF-C01 exam, know these key services:
  • Amazon Titan Embeddings — for generating text and multimodal embeddings
  • Amazon Bedrock — the managed service for accessing foundation models (token-based pricing)
  • Amazon OpenSearch Service — supports vector search for RAG implementations
  • Amazon Kendra — intelligent search service that can work alongside generative AI

11. Elimination strategy: If you're unsure, eliminate answers that confuse the hierarchy (e.g., claiming vectors are converted into tokens, or that embeddings are plain text). The flow is always: text → tokens → embeddings (vectors) → model processing → output tokens → text.

12. Practice scenario-based thinking: Many AIF-C01 questions are scenario-based. When you see a question about building a chatbot that uses company documents, think: documents need to be chunked, embedded into vectors, stored in a vector database, and queried using embedding similarity — this is the RAG pattern built on tokens, embeddings, and vectors.

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

Unlock Premium Access

AWS Certified AI Practitioner (AIF-C01)

Access to ALL Certifications: Study for any certification on our platform with one subscription
2150 Superior-grade AWS Certified AI Practitioner (AIF-C01) practice questions
Unlimited practice tests across all certifications
Detailed explanations for every question
AWS AIF-C01: 5 full exams plus all other certification exams
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

More Tokens, Embeddings, and Vectors questions

50 questions (total)

Start 50 question test