Using Azure AI Foundry Model Router in your AI projects.

Some months ago Microsoft introduced into its AI toolset in Azure AI Foundry an interesting service called Model Router.

The Azure AI Foundry Model Router is a deployable AI chat model within the Azure AI Foundry platform that intelligently selects the best large language model (LLM) to handle a given prompt in real time. It acts as a “meta-model” or orchestrator, evaluating factors like query complexity, cost, and performance to route requests to the most suitable underlying model, optimizing both quality and cost efficiency.

The Model Router assesses each prompt and directs it to an appropriate model, such as smaller, cost-effective models for simpler tasks or larger, reasoning-focused models for complex queries. By routing to smaller models when sufficient, it can achieve up to 60% cost savings compared to using high-end models like GPT-4.1 directly, while maintaining similar accuracy.

The Model Router is packaged as a single model deployment, simplifying integration via the Completions API, similar to using a single model like GPT-4 and developers can interact with the Model Router through a single endpoint, decoupling application logic from specific models, which enhances flexibility and scalability of an AI solution.

How Model Router works?

The Model Router analyzes the prompt’s complexity, required reasoning, and other factors in real-time and then it chooses the best AI model to satisfy the user’s prompt from a fixed set of underlying AI models. The model set depends on the router’s version. At the time of writing this post, the Model Router version is 2025-05-19 and the underlying models are the following:

  • gpt-4.1-nano (2025-04-14)
  • gpt-4.1-mini (2025-04-14)
  • gpt-4.1 (2025-04-14)
  • o4-mini (2025-04-16)

The selected model process the prompt and the response is returned via the standard Chat Completions API, with the chosen model identified in the response’s model field.

Please remember that each Model Router version uses a specific set of models. Auto-updates (by Microsoft) of the Model Router may change the underlying models, potentially affecting performance or costs (so please always monitor the usage).

Another important thing to remember: the Model Router’s context window (the maximum amount of tokens that can be processed in a single API call, including both the input (prompt) and output (response)) is limited by the smallest underlying model (e.g., 200,000 tokens input, 32,768 tokens output in the 2025-05-19 version). When the Model Router receives a prompt, it evaluates and routes it to the most suitable model. If the prompt (or prompt plus expected output) exceeds the context window of the chosen model, the API call may fail or be rejected because the selected model cannot handle the request.

How to deploy a Model Router?

To deploy a Model Router instance, open your Azure AI Foundry project and go into the Deployments section to deploy a new AI model:

In the AI model selection page, select model-router as the model to deploy (and give a name to your deployment):

When selecting the model-router deployment, you can see also its version and its underlying models (model names and version). Click Confirm and the model-router instance will be deployed:

When deployed, you can use your Model Router as any other Azure OpenAI models.

To test how it works, I’ve created a simple .NET application talking with my model-router endpoint previously deployed.

In the first test, I’m asking a simple question: what is Business Central? As you can see from the next image, the Model Router responds and it routes the request to the gpt-4.1-nano-2025-04-14 model:

The Model Router discovered that the question can be easily solved by the less capable model in its associated set of AI models, helping reducing costs (in the image you can see the input and output tokens used).

Now let’s do a second test, asking the following: write a detailed explanation of how sales orders are handled in Business Central and how can be sales statistics calculated:

As you can see, the Model Router now routed the request to the gpt-4.1-mini-2025-04-14 model.

Third test… let’s ask the following: Write an AL procedure that calculates the total of sales order lines amount for each country region code associated to the Customer record.

As you can see from the previous image (sorry if it’s small but I want to report the full response) the Model Router again routed the request to the gpt-4.1-mini-2025-04-14 model, because it’s the cheaper model able to fully answer to my question.

Fourth test..- let’s ask the following question: Write a detailed analysis on how the inventory costing should be handled in a Business Central project. Include must-have setup, mandatory configurations and suggestions for handling the activites in the warehouse.

As you can see from the above image, now the Model Router redirected the request to the o4-mini-2025-04-16 model, a more complex model able to reason and generate a more in-depth response.

The code of my AI solution is now totally independent of the AI model, it’s the Model Router that dynamically routes the AI requests to the most suitable model based on the user input. 🤩

What about Dynamics 365 Business Central? Can I use Model Router endpoints from AL?

I’ve used the Azure AI Foundry Model Router in many of my AI solutions with extreme success, but those solutions are mainly external .NET applications. The next step on this was to use the Model Router endpoint from AL code, using the standard Copilot Toolkit available in the AL language.

I created a very simple Copilot extension in Business Central with a custom endpoint (my model router endpoint) to do exactly something like in the .NET application used in the previous examples (sending a request to an AI model and receiving its response).

Result was not was I was hoping for… 😖🫤

It seems that the AL Copilot SDK is not able to use the Model Router endpoint. Debugging the AI exception, this is what I can see:

This is something that I need to investigate more in-depth, but I think that supporting the Model Router endpoint should be something that must be considered.

This is another reason why, for complex AI solutions in Business Central, I usually prefer to decouple things and create a custom AI layer (that gives me all the freedom I need, like AI platform and model independency):

How to monitor Model Router usage and performance?

To monitor the usage and the performance of your Model Router deployment, you can go to the Monitoring page of your deployment to see all the metrics filtered by period:

What about costs?

When you use model router TODAY, you’re only billed for the use of the underlying models as they’re recruited to respond to prompts and the model routing function itself doesn’t incur any extra charges.

But things are changing in few days… starting August 1, 2025 the model router usage will be charged as well.

Conclusion.

The Azure AI Foundry Model Router streamlines AI project development by intelligently routing requests to the most suitable AI model based on task requirements. It optimizes performance by balancing factors like cost, speed and accuracy across multiple AI models in a totally transparent way for the developer. Developers can integrate it to dynamically select models, ensuring efficient resource utilization. The router also supports seamless scalability, adapting to varying workloads in real-time.

If you’re deploying complex AI solutions for production workloads, I recommend to give it a check…

Original Post https://demiliani.com/2025/07/29/using-azure-ai-foundry-model-router-in-your-ai-projects/

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)

Leave a reply

Join Us
  • X Network2.1K
  • LinkedIn3.8k
  • Bluesky0.5K
Support The Site
Events
July 2025
MTWTFSS
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31    
« Jun   Aug »
Follow
Search
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...