LLMs and semantic models: Complementary technologies for enhanced Business Intelligence

Jonathan Rystrøm is an expert and PhD scholar at Oxford Internet Institute at the University of Oxford, where he researches how to improve the safety and fairness of foundation models in the public sector when they are deployed outside of the Bay Area.

Large Language Models (LLMs) – AI models that can predict text based on huge volumes of training data – are rapidly being integrated into BI workflows through copilots in everything from Fabric to Power BI. LLMs bring both immense promise of unlocking new levels of insight from data – but also immense hype. In this blog, we’ll provide a level-headed assessment of how LLMs can help developers and users of semantic models. We'll also show how semantic models can ground LLMs in business relationships to provide accurate and helpful answers to business-critical questions.

LLMs are fundamentally a semantic technology. They excel at transforming text into meaningful predictions and are used across various tasks from writing apps from scratch to summarizing complex financial regulation. Given their capabilities, it's natural to ask how LLMs can enhance another semantic technology: semantic models.

As a brief reminder, semantic models represent the meaning of data and the structured relationships between different data entities. As Kurt Buhler explains, "A semantic model is essential for you to meet business data needs" by providing a structured representation that maps relationships between data entities and their business meaning. For instance, a well-built semantic model can allow users to integrate insights across datasets to answer business critical questions like “Which market is experiencing the highest growth?”.

In this blog post, we'll explore where and how LLMs can improve the workflow of building and using semantic models, and vice versa.

Different types of LLMs and their use cases

The first thing to note about LLMs is that they're not just a single technology. There are many different types of LLMs, each with their own strengths and weaknesses.

A flowchart illustrating the evolution of large language models (LLMs) in four stages.

At its core is the so-called "base" or "foundation" model - a model trained simply to predict the next sub- word given a huge training corpus, often spanning billions of words and encompassing much of the content available on the internet – like the original GPT-3. While these foundation models form the basis of all other models, they can be difficult to use because they simply predict the next word without any concept of interaction.

In this section, we'll explore three iterations of LLMs: chat LLMs, reasoning LLMs, and LLM agents. Each build upon the previous version to unlock new capabilities.

Chat LLMs

The first step in making LLMs more useful is to enable interaction in a conversational format. The key innovation here is training the LLM to follow instructions, which was the breakthrough of ChatGPT when it was released in November 2022. This seemingly simple change had a tremendous impact, similar to how a well-structured Power BI report can unlock insights already present in raw data tables.

While many readers have likely interacted with chat LLMs, there are three specific ways they can be especially useful for developing and using semantic models:

Translation: In a previous blog post, we explored how an LLM can be used to translate measures between languages, which can save significant time. However, to unlock the full power of chat LLMs, we need to understand translation in a wider sense - translating between domains, as discussed by experts from the Oxford Internet Institute. This could involve:

Translating between terms familiar in the sales department to terms used in the finance department
Assisting in the crucial step of translating high-level business requirements to technical specifications
Helping upgrade from legacy BI systems to modern platforms like Microsoft Fabric. In a related use case, Amazon used Generative AI to upgrade their Java codebase to Java 17, which saved them an estimated 4,500 developer years!

Documentation: Good documentation is essential for ensuring semantic models provide maximum value to business users and that there's a shared understanding across the organization. LLMs can help streamline and automate this often tedious process. Darren Gosbell demonstrated how ChatGPT could be used to automatically generate measure descriptions for Power BI and Analysis Services models. Specifically, it can help with:
- Suggesting field descriptions
- Suggesting comments to code
- Annotating fields with metadata, which can be used as context for other LLM applications
- Generating flow diagrams (using, e.g., js) to visualize logic dependencies.
- Drafting or summarizing documentation (though remember to validate!)
Query Assistance: Chat LLMs can help with understanding complex queries and suggesting fixes. However, this is an area we must approach with caution, as we’ll explore in the next section

Limitations of Chat LLMs

While Chat LLMs have many valuable use cases, as illustrated, they still have several important limitations:

Limited Training Data: While LLMs are trained on huge swaths of the internet, there might not be enough DAX training data compared to more popular languages like Python. This means their performance with DAX might not be as good, particularly as newer DAX functions or functionality (like INFO functions or DAX User-Defined Functions) are released with newer versions of Analysis Services.
Hallucinations: LLMs are trained to predict the most likely next word given the context, but there's no guarantee that this output will be factually correct. They often produce content that sounds plausible but isn't necessarily accurate. Therefore, it's extremely important always to verify what these models output, especially if you're not an expert in the domain.
Limited Reasoning Abilities: Chat LLMs operate similarly to what psychologist Daniel Kahneman calls "System 1 thinking" - fast, automatic thinking that states what comes to mind without deeper reflection. This puts strong limits on the types of tasks the model can handle.

Reasoning LLMs

Reasoning LLMs have been trained to "think out loud," allowing them to solve much more complicated problems requiring math, coding, and reasoning. Benchmarks measuring coding abilities show a significant performance jump when comparing reasoning LLMs to standard LLMs.

These added capabilities unlock new use cases, particularly for checking semantic models' consistency and quality. For instance, you can feed in measures and the associated semantic model, and a reasoning LLM could identify duplicated measures or broken logic because it can think more deeply about the model.

Another advantage is that reasoning LLMs can serve as a second line of attack. If you've tried and failed to solve a problem with a normal chat LLM, you can try the same problem with a reasoning LLM (or potentially a group of different LLMs), which might improve performance.

Limitations of reasoning LLMs

Cost: The main cost for LLMs is processing and generating text. When models think for longer, they use more processing power, which can dramatically increase costs. When OpenAI beat top competitive programmers, the cost was more than $1000 per query.
No Real-World Access: Reasoning LLMs still lack access to outputs and real-time information. They require humans in the loop to see the code's output, and their training data might be outdated.

LLM Agents

LLM agents are language models that work in a loop with access to tools that can interact with the real world. This could include a code interpreter that allows the language model to execute the code it writes, or direct access to APIs that provide up-to-date information and potentially direct queries to semantic models.

The simplest kind of agent is an LLM with web search. This setup provides access to more up-to-date information, which can help the LLM provide grounded answers. Many people might have tried interacting with this type of agents through, e.g., Perplexity.

The capability to utilize tools unlocks entirely new use cases. For instance, agentic LLMs could serve as automatic validators of semantic models or quality assurance tools. They could perform adversarial testing of the model to check for data quality issues and errors by realistically loading and querying semantic models. Beyond catching logical errors, LLM agents could, for instance, detect referential integrity issues by interactively querying the data.

Furthermore, you could build a self-service BI agent where business users can interact with the agent to solve complex queries instead of having to go to the BI department.

Limitations of LLM Agents

Early Stage Technology: There's still uncertainty about how best to implement agents, and their performance is limited by the underlying models, which don't always perform well and can still get confused with long contexts.
Complexity and Security: The increased complexity of agents adds challenges to ensuring secure and reliable deployment. If not careful, agents can easily overwhelm systems with a barrage of queries and dramatically increase the attack surface for adversarial actors.

How semantic models can improve LLMs

So far, we've talked about how different types of LLMs can improve the workflows associated with semantic models. But semantic models also hold great promise for improving LLMs.

One significant way is that semantic models can provide structured information to ground the knowledge of large language models. An experiment referenced by Cube.dev showed that providing LLMs with a semantic model boosted the accuracy of LLM-generated queries on a database by 20% [to 100%] on a subset of questions.

This happens because semantic models can act as a source of truth and constrain the generations of the LLM to ensure that the code it writes is based on actual relationships and relies on available measures, which can dramatically reduce hallucinations.

Imagine that you want to ask an LLM to calculate which of your markets experienced the highest growth last year. Without a semantic model, the LLM would struggle to know the names of the relevant tables as well as which columns and measures can answer the question. With access to a semantic model, this becomes a much easier task for the LLM as it can ground its response in actual relationships.

As Enterprise Knowledge notes, "While LLMs are a powerful technology, they can result in an inaccurate, generalist approach" without the structure provided by a semantic layer.

This creates a virtuous cycle: better semantic models and metadata improve the performance of the LLM, which can in turn improve the documentation and workflows of the semantic model. Microsoft's Fabric Copilot demonstrates this by using descriptions and synonyms from the semantic model to improve its query writing capabilities.

Important considerations for integrating LLMs and semantic models in practice

Table 1: Summary of important considerations

Now that we've discussed the mutual benefits of semantic models and LLMs, let's consider six important aspects of integrating them in practice.

Security and privacy

Many LLMs are complex to deploy, which means many organizations rely on APIs to access and deploy language models. It's important to select a trusted provider, such as Microsoft Azure OpenAI, and follow best practices. We'll dive deeper into security in a future blog post.

Workflow integration

Ensuring that LLMs fit with your existing workflows is crucial. Treating the LLM as a microservice can be very useful, which also requires appropriate quality assurance, testing, and monitoring.

User training

Empowering users to leverage AI is key for successful adoption. Both technical and business users in your organization need to understand where AI shows potential and where extra caution is required. For instance, when doing simple translations, AI can be very strong, but when generating complex DAX queries, verification and checking are essential. As Kurt Buhler advises, "Ensure that you validate any DAX query output from an AI model".

A good approach is to ensure there are always humans in the loop and strong transparency about where in the organization AI is generating solutions. You need to ensure that LLMs remain copilots and don't inadvertently steer the organization in the wrong direction.

Cost (and benefit) tracking

It's important to track both the costs and benefits of these solutions. The main costs for LLMs are processing and generating text. It's good practice to measure this at as granular a level as possible, ideally at the use case level, to ensure that LLMs are providing real benefits.

If possible, benchmark this against the cost of not using LLMs. For instance, if generating 100 pages of documentation costs $100 in API calls, how much would it have cost in human expert time? Expensive specialists likely have many other priorities competing for their time.

Cost analysis is also important for deciding which models to use. For some tasks, reasoning models might be too expensive, but chat LLMs might do the job just as well.

Experimentation

AI is a rapidly developing field, so experimentation is key. AI researcher Ethan Mollick talks about the "jagged frontier" of AI capabilities. In a study exploring a hundred different business-relevant tasks, he and his collaborators found drastically different performances that were hard to predict.

The only way to ensure you get the benefits of AI is to try different use cases and see what works. Therefore, it's important to distinguish between the costs of having solutions in production and the costs associated with learning. Having a sufficiently large learning process is essential for staying on top of developments.

Keep humans in the loop

LLMs work best when guided by capable humans. As noted, the exact areas where LLMs excel can be hard to predict. It’s therefore crucial that humans are kept in the loop and have full insight into all processes.

While user training can get you some of the way to looping in your co-workers, well-designed processes are equally important. One should ensure that there are procedures in place for manually verifying outputs from LLMs. For instance, when translating measures, one should get a native speaker to manually validate a subset to ensure the outcome is sound.

Conclusion and what comes next

In this post, we've explored how two meaningful technologies - LLMs and semantic models - can synergistically improve one another, potentially leading to more value for both developers and the business. We looked at three different types of LLMs and how each unlocks new use cases for improving semantic model workflows, from writing documentation to acting as automated testers.

We also explored how semantic models can help ground LLMs in up-to-date and validated contexts, improve their performance, and reduce hallucinations. Finally, we discussed important considerations for implementing LLMs in organizations.