Flashcards

Prepare the Data

Get, clean, transform, and load data from various sources using Power Query.

5 minutes 5 Questions

Preparing data in Power BI is a critical phase that transforms raw information into a clean, structured format suitable for analysis and visualization. This process occurs primarily within Power Query Editor, where analysts shape and refine their datasets before loading them into the data model. T…

Concepts covered

Identify and connect to data sources Connect to shared semantic models Change data source settings and credentials Configure privacy levels Choose between DirectQuery and Import Create and modify parameters Evaluate data statistics and column properties Resolve data inconsistencies and null values Resolve data quality issues Resolve data import errors Select appropriate column data types Create and transform columns Group and aggregate rows Pivot, unpivot, and transpose data Convert semi-structured data to tables Create fact tables and dimension tables Reference and duplicate queries Merge and append queries Identify and create relationship keys Configure data loading for queries

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

PL-300 - Prepare the Data Example Questions

Test your knowledge of Prepare the Data

Question 1

A financial technology startup is developing a Power BI solution that processes cryptocurrency trading data from multiple exchange APIs. The data engineering team has structured their Power Query Editor with 28 queries: 'APIConnectors' queries (4) establish connections and handle authentication tokens, 'RawTrades' queries (8) pull historical trade data from each exchange, 'Normalization' queries (10) standardize currency pairs, timestamps, and price formats across different exchange schemas, and 'Analytics' queries (6) produce aggregated trading metrics for the dashboard. After the initial deployment, the DevOps team reports that the Power BI Service gateway is experiencing memory pressure during scheduled refreshes. Memory profiling reveals that the 'Normalization' queries, which contain complex string parsing and cross-reference lookups, are creating large intermediate result sets that persist in the semantic model as separate tables with millions of rows each. The team confirms that only the 'Analytics' queries should exist as model tables, but the 'Normalization' queries must continue processing during refresh to feed data into the 'Analytics' queries. The lead architect needs to reconfigure the solution to eliminate the intermediate table storage while preserving the data transformation pipeline. Which specific action should be performed on each 'Normalization' query to achieve this requirement?

Apply the Table.Buffer function at the end of each 'Normalization' query to cache results in memory during refresh and automatically exclude them from persistent model table creation process Convert each 'Normalization' query to a function query using the 'Create Function' option, which automatically excludes intermediate results from the semantic model storage allocation Right-click each 'Normalization' query and uncheck 'Enable load' in the context menu to prevent model table creation while maintaining query execution for dependent references Right-click each 'Normalization' query and uncheck 'Include in report refresh' in the query properties to prevent model table creation while maintaining query execution for dependent references

Correct Answer: Right-click each 'Normalization' query and uncheck 'Enable load' in the context menu to prevent model table creation while maintaining query execution for dependent references

The correct solution is to right-click each 'Normalization' query and uncheck 'Enable load' in the context menu.

This is the proper approach because:

How 'Enable load' works: When you uncheck 'Enable load' for a query in Power Query Editor, the query still executes during refresh and can be referenced by other queries, but it does NOT create a table in the semantic model. This is exactly what the scenario requires - the Normalization queries need to run to feed data into the Analytics queries, but should not persist as separate tables.
Memory optimization: By disabling load on the Normalization queries, the intermediate result sets with millions of rows will not be stored in the semantic model, which directly addresses the memory pressure issue on the gateway.
Preserving the pipeline: Other queries can still reference queries with 'Enable load' disabled. The Analytics queries will continue to receive processed data from the Normalization queries during refresh.

Why the other options are incorrect:

'Include in report refresh' - This option does not exist in Power BI's Power Query Editor context menu. This is a fabricated option that sounds plausible but is not a real feature.

Converting to function queries - While functions are useful for parameterized transformations, simply converting a query to a function doesn't automatically exclude results from the model. Functions need to be invoked, and the scenario describes queries that process data, not reusable transformation patterns. This approach would require significant restructuring and doesn't directly solve the problem.

Table.Buffer function - This function caches query results in memory during execution to prevent multiple evaluations, but it does NOT prevent tables from being loaded into the semantic model. Table.Buffer is a performance optimization technique, not a model storage control mechanism. The query would still create a table in the model after refresh completes.

Question 2

In Power Query M, which function parameter in Table.AddColumn specifies the expression used to compute values for each row of the new column?

The second parameter accepts a function that iterates through the table and applies transformations to existing columns The fourth parameter accepts a function that evaluates conditional logic and determines the column data type automatically The third parameter accepts a function that receives each row as a record and returns the computed value The third parameter accepts a static value or constant that gets replicated across all rows in the newly created column structure

Correct Answer: The third parameter accepts a function that receives each row as a record and returns the computed value

The Table.AddColumn function in Power Query M has the following syntax: Table.AddColumn(table, newColumnName, columnGenerator, optional columnType).

The third parameter (columnGenerator) is indeed a function that receives each row as a record and returns the computed value for that row in the new column. This function is evaluated for every row in the table, allowing you to create dynamic calculated columns based on existing data.

For example: Table.AddColumn(Source, "Total", each [Quantity] * [Price])

In this example, 'each' is shorthand for a function that takes a record (representing each row), and [Quantity] * [Price] is the expression that computes the value.

The other answers are incorrect for the following reasons:

The second parameter is not a function for iteration - it's actually the new column name (a text value), not a computational function.
The fourth parameter is not for conditional logic - it's an optional parameter that specifies the column's data type explicitly, not a function for computation.
While the third parameter does accept a function, it's not limited to static values or constants. It's specifically designed to accept a dynamic function that can reference row values and perform calculations, making it much more powerful than simply replicating static values.

Question 3

What is the primary purpose of the 'Table.RemoveRowsWithErrors' function in Power Query when resolving data import issues?

Removes all rows from a table that contain null values and replaces them with defaults Removes all rows from a table that contain error values in any column Removes only the first occurrence of an error row and logs it for later review Removes error values from specific columns while preserving the row structure intact

Correct Answer: Removes all rows from a table that contain error values in any column

The 'Table.RemoveRowsWithErrors' function in Power Query is specifically designed to remove all rows from a table that contain error values in any column. This is its primary and intended purpose when dealing with data import issues.

When you import data into Power Query, you may encounter rows with errors due to type conversion issues, invalid data formats, or other data quality problems. This function provides a clean way to eliminate those problematic rows entirely from your dataset.

The other options are incorrect for the following reasons:

The option about removing rows with null values and replacing them with defaults is incorrect because this function specifically targets error values, not null values. Null values and errors are treated differently in Power Query, and this function does not replace anything with defaults.
The option about removing only the first occurrence and logging it is incorrect because the function removes ALL rows containing errors, not just the first one. Additionally, it does not have built-in logging functionality for later review.
The option about removing error values while preserving the row structure is incorrect because this function removes the entire row, not just the error values within cells. If you wanted to keep rows but handle errors differently, you would use other functions like 'Table.ReplaceErrorValues' instead.

Unlock Premium Access

Power BI Data Analyst

Access to ALL Certifications: Study for any certification on our platform with one subscription
3191 Superior-grade Power BI Data Analyst practice questions
Unlimited practice tests across all certifications
Detailed explanations for every question
PL-300: 5 full exams plus all other certification exams
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

Start Your Free 7-Day Trial