You're Deploying it Wrong! - AS Edition (Part 4)

This is part 4 of the Analysis Services DevOps blog series. Go to part 3

Refining the build pipeline

In the last chapter, we saw how to set up the first basic build pipeline, which used Tabular Editor to deploy a model from a .bim file or folder structure unto an instance of Analysis Services. It’s time to take this one step further. In real life, there are a couple of things we’d like our build pipeline todo. Essentially, we want to make sure that we’re producing an artifact that is ready for deployment, so for a Tabular model, this typically means the following:

  • Best Practice Analysis: Ensure that Best Practice Rules are obeyed.
  • Schema check: Ensure that the model can connect to its data source(s) and source columns map correctly to imported columns.
  • Validation deployment: Ensure that the model does not contain invalid DAX or other semantic errors (for example, circular dependencies).
  • Refresh Check: Ensure that partitions can be refreshed without errors or warnings.
  • Unit Testing: Ensure that calculations provide expected results.
  • Prepare artifact: Create a Model.bim file containing everything needed for deployment.

The 6 steps above roughly correspond to the tasks that we need in our build pipeline. Once we have this pipeline set up, we can use branch policies, to ensure that all changes build succesfully before a pull request can be approved.

In the following, we will assume that your Tabular Model is saved as a folder structure within the “AdventureWorks” folder in the root of your git repo. If it’s located somewhere else, change the first argument of the calls to TabularEditor.exe from $(Build.SourcesDirectory)\AdventureWorks to point to the Model.bim file or to the folder housing the database.json file of your tabular model.

Step 1 - Best Practice Analysis

If you’re using Tabular Editors Best Practice Analyzer (as you should), it makes a lot of sense to check your model for any rule violations before we do anything else. Most people use the Best Practice Analyzer to check things like naming conventions, that numeric or foreign key columns have been hidden, that Descriptions have been provided for all visible objects, etc. The Tabular Model developer is generally responsible for making sure that no rule violations exist before they commit their code, but just in case someone forgot this, we should run the analysis as part of our build pipeline.

Start by creating a new empty build pipeline and make sure that TabularEditor.exe is available, as described in the last chapter.

Add a new command line task, with the following command:

TabularEditor.exe "$(Build.SourcesDirectory)\AdventureWorks" -A -V

The -A switch instructs Tabular Editor to run the Best Practice Analyzer. This is only meaningful if your model contains a set of rules or links to rule definitions, or if a BPARules.json file exists within the agents %ProgramData%\TabularEditor or %LocalAppData%\TabularEditor folders.

image

If you want to have more control over which rules your model is validated against, you can also specify the path of a rules file like so (assuming there’s a BPARules.json file in the root of your git repo):

TabularEditor.exe "$(Build.SourcesDirectory)\AdventureWorks" -A "$(Build.SourcesDirectory)\BPARules.json" -V

In this case, the model will still be validated against rules defined within the model, or rules linked within the model, as on the screenshot above, but rules residing in the %ProgramData%\TabularEditor and %LocalAppData%\TabularEditor folders are no longer considered.

The -V switch instructs Azure DevOps to treat rule violations differently, depending on the severity level of the rule being broken. Severity level 3 violations are reported as errors (causing the build to fail), level 2 are reported as warnings (causing the build to partially succeed), and level 1 are only informational.

At this point, your pipeline definition should look something like this:

image

Executing the build, we would see the following:

image Column references should ALWAYS be qualified with the table name. Marco Russo would probably argue that this rule should be set to severity level 3, to cause an error instead of a warning.

Great! Let’s move on to ensuring that source columns are correctly mapped to imported columns.

Step 2 - Schema Check

Note: Schema Check only works for supported provider (legacy) data sources: OLE DB, ODBC, etc.

Model.bim or Database.json files normally don’t store credentials for accessing the data sources used by the model. For this reason, we often have to apply these credentials after deploying. Even if we are able to use integrated security, we might still want to change the connection string of a data source during deployment, for example in order to point it to a different database. This makes sense in a multi-tiered, multi-environment BI set up: Your production models use data from your production DWH, your UAT models use data from your UAT DWH, etc.

You have probably noticed, that passwords are never stored as clear text in the connection string of legacy data sources, when a model is saved to disk, which is a good thing. Credentials and other sensitive information should not appear in your git repository. In order to perform the schema check, we must update the connection strings on our model data sources, but how do we achieve this, without exposing sensitive information in build logs, etc.?

The solution is to create a Tabular Editor (C#) script, which will be executed from the command line using the -S switch. First, create the following script, and put it within a “scripts” folder in your repository - name it “SetConnectionStringFromEnv.cs”:

foreach(var dataSource in Model.DataSources.OfType<ProviderDataSource>())
{
    var evName = dataSource.Name.Replace(" ", "") + "ConnectionString";
    var evValue = Environment.GetEnvironmentVariable(evName);
    if (evValue != null)
        dataSource.ConnectionString = evValue;
}

This will loop through all (provider) data sources of your model, and replace the connection string on each with the value of an environment variable whose name corresponds to the name of the datasource (with spaces removed) and the “ConnectionString” suffix. So if you have a data source in your model named “SQL DW”, the script will use the value of the SQLDWConnectionString environment variable as the connection string of the model (provided the environment variable exists).

Scripts are always executed on the model loaded into Tabular Editor, before other commands (schema check, best practice analysis, deployment, etc.) are performed. This means we can execute the string to safely update the connection strings AND perform the schema check, with the same instance of Tabular Editor. Create a new command line task, and use the following command within it:

TabularEditor.exe "$(Build.SourcesDirectory)\AdventureWorks" -S "$(Build.SourcesDirectory)\AdventureWorks\scripts\SetConnectionStringFromEnv.cs" -SC -V

On the “Variables” tab, create a new variable and provide the full connection string as its value. Click the little “lock” icon to secure it:

image

Back on the “Tasks” tab, map the pipeline variable to a new environment variable. If you have multiple data sources, remember to create both a pipeline variable and an environment variable for each of them:

image

Thanks to the -SC switch, Tabular Editor will establish a connection to each data source, and compare the schema of each partition query against the imported columns of the table. If any columns are missing in the source, an error is generated. Other differences (mismatched data types or superfluous columns) will generate a warning.

Remember that for this to work, the agent running the build pipeline must be able to reach the data source. This means that if your source is an Azure SQL Databases, for example, the IP of the build agent needs to be whitelisted. For a Microsoft-hosted build agent, you can set the “Allow Azure services and resources to access this server”:

image

And behold - Tabular Editor now informs us that we have unmapped columns (in this case, it’s only a warning, since processing should still succeed, but in the words of Brent Ozar, you’re needlessly yelling data across the network, so you might want to look into this at some point):

image

Step 3 - Validation deployment

Since Tabular Editor can not perform DAX syntax checking and semantic validation, we have to deploy the model to an instance of Analysis Services to make sure there are no such errors. This is exactly what we did in the last chapter. Let’s improve the command line task a little bit, using the same technique as above, to securely update data source connection strings. Also, as mentioned in the last chapter, a better practice for passing credentials used to connect to Analysis Services, is through environment variables. So go ahead and add an additional (secured) pipeline variable for your Analysis Services connection string, and an (unsecured) pipeline variable for the name of the database you want to deploy:

image

We will use the following command to execute the same script as in step 2, while also deploying the model:

TabularEditor.exe "$(Build.SourcesDirectory)\AdventureWorks" -S "$(Build.SourcesDirectory)\AdventureWorks\scripts\SetConnectionStringFromEnv.cs" -D "%ASConnectionString%" "$(ASDatabase)" -O -C -P -R -M -V -E -W

Here, we’re transferring the connection string of our Analysis Services instance to Tabular Editor through an environment variable, %ASConnectionString%, where as the name of the database to deploy is passed directly from the pipeline variable, $(ASDatabase). Notice the different syntax between the two.

image

This approach is better than what we did in the previous chapter, because we can still see the executing command in the output of the build, but the credentials will not be exposed anywhere:

image

Step 4 and 5 - Refresh check and unit testing

Unit testing on a Tabular model typically requires some data. For example, it would make sense to check whether the measure [Sales Amount] returns the same figure as the SQL query SELECT SUM([Sales Amount]) FROM fact.ResellerSales. For this to work, however, we need to have data within the model as well, which would require some kind of refresh. At the same time, we don’t want our build pipeline to take too long to complete, which would be the case if Analysis Services had to refresh lots of data. For this reason, consider providing a reduced test dataset for unit testing. There are various ways to do this (which I will cover later), but regardless of how you make the test data available, make sure that the data volume is small enough that a Full refresh can execute in a timely manner (for example, less than 5 minutes).

Even if you don’t want to do any unit testing or can’t provide a test dataset, I still recommend executing a Calculate refresh within the build pipeline, to make sure that the model doesn’t have any bad DAX on calculated tables, calculated columns or calculation groups.

Since Tabular Editor cannot perform refreshes on a tabular model, we have to resort to other methods in order to check whether the model can be refreshed. The easiest approach is a PowerShell script, using the Invoke-AsCmd cmdlet, to execute a TMSL script for refreshing.

For unit testing, the easiest approach is setting up a unit testing framework, such as Pester or NBi.

If you need more details on this area, please provide a comment below, and I will consider it for a later chapter of this blog.

Step 6 - Preparing the artifact

The last step we need in the build pipeline, is to publish the artifact that will be carried over to the release pipelines, and used for deployment. Let’s say your build pipelines includes additional Tabular Editor scripts, that auto-generate measures and other objects, that are not stored in source control. This could be the case if you’re using the Master Model Pattern or other kinds of automation. You want to make sure that your artifact is “ready-for-takeoff”.

Add one additional command line task, which will use the -B switch to save the model into a .bim file. Here, you can include any additional scripts that you need, but avoid replacing connection strings with their sensitive counter parts. You will do this when it’s time for deployment in the release pipelines (also, the artifact should be deployable to multiple different environments - not just one specific environment - so at this stage you might not even know what the connection strings should point to).

Use the following command:

TabularEditor.exe "$(Build.SourcesDirectory)\AdventureWorks" -B "$(Build.ArtifactStagingDirectory)\Model.bim"

Lastly, add a “Publish build artifact” task:

image

And we’re done!

In the next chapter, we will see how the release pipelines are created, and how we can set up continuous integration to trigger our build pipeline automatically, whenever code is pushed to git.

Written on October 17, 2019 by Daniel Otykier