Connect to and transform data for a semantic model

Written by Kurt Buhler | January 23, 2024

Starting your model by getting the right data in the right shape

A semantic model provides meaning to data by using objects, calculations, and logic to represent a real-world process, like sales or pricing. A semantic model is a core component of Microsoft Fabric and Power BI; it’s one of the most important item types you manage in an end-to-end solution, as it’s the primary way in which your end-users interact with the data. It’s therefore important for you to know how to effectively design, build, and deploy good semantic models efficiently so that you can meet business data needs.

In this series, we provide some tips and guidance to help you effectively build a semantic model. In the previous article, we describe how you evaluate your data before starting to build your model, to identify and resolve possible issues. This step is also important to understand what transformations you need to make to get your data in the right shape to make a good model.

In this article, we show you how you can connect and transform data when you build your model by using Tabular Editor or Power BI Desktop.

NOTE

There are many equally valid ways to build a semantic model that fits your data needs. This series presents one approach, sharing tips and good practices from our experience.

Choose how you’ll transform your data

To transform your data for a semantic model, you have many options in Fabric and Power BI. Depending upon your needs, available resources, skills, and preferred workflow, you can choose a different tool; there’s no “best” approach. Generally, it’s considered a good practice to shift these transformations as far upstream as possible, and as far downstream as necessary (a paradigm often quoted as “Roche’s maxim”). The reason for this is that downstream transformations tend to be more difficult to maintain and can have performance consequences.

When transforming your data, your objective is to create a data structure that best fits the intended purpose of the model. Typically, this involves creating a star schema with distinct fact and dimension tables, but the precise structure of your model varies depending on your requirements, and what you’re trying to achieve.

Approaches to transform data by using Fabric and Power BI

You have different tools available to transform data in both Fabric and Power BI. Some common examples of these tools are depicted in the following diagram.

Figure 1: In Fabric and Power BI, you can transform data in different places. It’s important to know the options and their use-cases, including pros/cons and cost implications (i.e. for Fabric Capacity Units; CUs). Use the right tool for the right problem; don’t try to do all your transformations in the semantic model when better upstream tools are available. Similarly, observe Roche’s maxim of data transformation, whereupon you want to move changes upstream where it’s possible and downstream where it’s necessary.

The following are examples of ways you might transform your data by using the different items available in Fabric. Typically, these tools ingest data from a source, transform it, and land it where your semantic model can access the transformed data. You may use one or more these different tools to transform your data.

NOTE

Since this series is focused on semantic models, the following is a brief list of the options available; it’s not the scope of this article to explain the differences and considerations of each approach. This also varies greatly depending on your needs, skills, and situation.

Follow the links for more detailed information from the Microsoft documentation.

Data pipeline

Spark job definition

Notebook

Dataflow Gen2 (Power Query)

Data warehouse

The following are examples of ways you might transform your data by using Power BI:

Power BI dataflows (Gen1 dataflows; Power Query)

Power BI Datamarts (in preview; Power Query)

Semantic model (Power Query or DAX)

Reports (Custom visuals like Python, R, or Deneb, where minor transformations fit the data to the specific, custom visual)

TIP

Read the Best practices when working with Power Query from the Microsoft documentation if you’ll be transforming data in Power Query. This article has some helpful tips to improve the performance of your queries.

Generally, you want to try to leverage the appropriate tools to do the “heavy lifting” transformations before the data reaches the semantic model. This is essential when you intend to build a semantic model that uses Direct Lake (only in Fabric) or DirectQuery storage mode. However, when you’ll build an import model, it’s common that you may need to do some mild to moderate transformations in Power Query, still. Examples of such transformations include:

Reducing detail either by limiting rows, columns, or decimal precision (all of which are essential steps to reduce the size and increasing the performance of your import model).

Adding tables necessary for reporting, like measure tables, date tables, or selection tables.

Adding columns necessary for reporting, such as for sorting, binning, or grouping data in certain ways.

In the rest of this article, we’ll show you how you can connect to and transform data in Power Query using Tabular Editor, Power BI Desktop, or a combination of both.

Connect to and transform data in your semantic model

Performing transformations in Power Query for semantic model can be a perfectly valid approach, depending upon your model and data. Before you start transforming the data, be warned that performing many transformations in the semantic model can be problematic. For instance, this approach doesn’t scale as well for larger data volumes and complex transformations, compared to other, upstream tools.

WARNING

Some transformations in Power Query can make data refreshes slower and your model harder to maintain. For instance:

Joins between large tables.
Functions that scan the whole table or break query folding.
Power Query transformations with an excessive number of steps.
Power Query transformations that rely upon complex or redundant custom functions / transformations.

We’ll discuss considerations like this more in a later article, called Optimize your semantic model.

To connect to and transform your data in Power Query while building your model in Tabular Editor, you can take one of three different approaches.

Approach 1: Connect to a supported source with the table import wizard

For supported sources like a SQL Endpoint or dataflow, the table import wizard in Tabular Editor will create the objects and generate the Power Query code for you. Use this approach when:

You are connecting to a supported source.

You don’t need to use the Power Query user interface (UI).

You don’t need to view the data as you’re transforming it with Power Query / M.

Figure 2: You can use the table import wizard for supported sources and Tabular Editor will automatically generate the Power Query (M code) and tables / columns for you.

To use this approach, follow the below steps:

From the ‘Model’ menu, select “Import tables”.
Select “Use implicit data source (credentials managed by the Fabric/Power BI Service).
Select your data source from the list.
Enter the appropriate connection information and authenticate to connect to your data source.
Choose the source tables/views that you want to import metadata from. While selecting a table or view, you can select specific columns and even preview the data.
Proceed through the wizard and confirm; Tabular Editor will automatically add the objects to the model.
Select the table partition to view (and adjust) the Power Query (M code).

With this approach, you can push transformations back to an upstream tool and connect to the transformed data in one of the supported sources (like Fabric OneLake). If you’re one of the rare people skilled in writing M code, you can further adjust the partition expression right in Tabular Editor.

However, what if you are connecting to aa source that’s not supported by this table import wizard, like SharePoint? Or, what if you want to do further transformations, and need the Power Query UI?

Approach 2: Use Power Query and copy the advanced editor code to Tabular Editor

You can connect to your data source using the Power Query user interface of Power BI Desktop, Excel, or dataflows, then copy the M code to Tabular Editor when you’re ready. With this approach, you benefit from using the Power Query UI and seeing the data as you transform it; however, it’s a manual step to switch tools.

Use this approach when:

You’re connecting to an unsupported source.
You’re comfortable with the Power Query experience.
You have a reason to prefer working on model metadata in “disconnected mode”, instead of maintaining a constant connection or importing the data like you would in Power BI Desktop.

Figure 3: You can develop the Power Query logic in Power BI Desktop (or even Excel), then open it in the advanced editor and paste it in the partition expression.

To use this approach, follow the below steps:

Connect to your data source from your client tool of choice, as you normally would.
Transform your data using Power Query, as you normally would.
When ready, open the “advanced editor” and copy your code.
In Tabular Editor, create a new table.
Select the table partition and paste the M code that you copied.
Save your changes, then right-click the table, and select “update table schema”. Tabular Editor will automatically create the columns and set their data types based on your code.

NOTE

When using this approach, we recommend that you save your transformations in a separate .pbip file in the same repository as your semantic model. This way, you can set up automated logic to copy future changes in the .pbip queries to your model metadata, making it easier to maintain.

Approach 3: Use Tabular Editor as an external tool for Power BI Desktop

You can also continue to develop the model from Power BI Desktop, saving and deploying the Power BI projects (.pbip) file. In this scenario, you benefit from the productivity enhancements of Tabular Editor, while having the convenience of the user interface of Power BI for managing queries and other features.

Use this approach when:

You are most comfortable using Power BI Desktop to develop and publish semantic models.
You are using a Tabular Editor 3 Desktop Edition license.

Figure 4: You can develop the Power Query logic in Power BI Desktop, save the model as a Power BI projects (.pbip) file, and then continue working in a combination of Power BI Desktop and Tabular Editor. You can connect to your open model either by using the ‘Tabular Editor’ button in the External Tools ribbon, or connecting to the local instance from within Tabular Editor.

To use this approach, follow the below steps:

Connect to and transform your data source from Power BI Desktop, as you normally would.
Once you connect to your data, proceed with development using a combination of Power BI Desktop and Tabular Editor, as you see fit. You can open Tabular Editor 2 or 3 from the ‘External Tools’ ribbon.

NOTE

In Tabular Editor, you can view and edit Power Query from the expression editor in three different places:

Standard tables: Selecting a table partition.
Incremental refresh tables: Selecting the table, and in the expression editor, selecting “Source Expression” from the drop-down menu.
Tables that aren’t loaded, Power Query parameters, or custom functions: Under the Expressions category.

Add static or calculated tables, if necessary

Many semantic models contain additional tables aside from those you add from the data source. These tables typically serve a specific function within the model or for a connected report. Examples of these tables include:

Measure tables: Tables where you sort and organize measures away from other objects.
Date table: A table made in Power Query or DAX for time intelligence (if a sufficient table isn’t available from the data source).
Last refresh: For import storage mode models, a table with a scalar value containing the date and time of the last successful model refresh.
Static tables: Tables used for various purposes, typically to achieve specific reporting functionalities like dynamic measure selection.

Often, you need to add the same static tables when you create a new model. This is therefore a good process to automate by using a C# script in Tabular Editor. For instance, this script creates a measure table, and this script creates a date table.

NOTE

If you are using Direct Lake or DirectQuery storage mode, you’ll need to add these tables to your data source.

Once you’ve finished and have all the tables in your model, you can proceed with creating relationships, authoring DAX, and adding additional business logic.

In conclusion

Once you’ve gathered the requirements for your semantic model, the next step is to connect to your data. You want to ensure your data is model-ready before you proceed to this step, and if not, ensure that you deal with data quality issues and transform it with an appropriate tool of your choice. If you need to transform data in Power Query, you can take various approaches to doing this while you develop your model in Tabular Editor, Power BI Desktop, or a combination of the two.

In the next article of this series, we’ll share how you can create, test, and manage relationships in your semantic model with Tabular Editor and Power BI Desktop.

View full post