Windows ML Archives - Windows Developer Blog

Windows ML is generally available: Empowering developers to scale local AI across Windows devices

Athima Chansanchai, Writer — Tue, 23 Sep 2025 19:00:03 +0000

The future of AI is hybrid, utilizing the respective strengths of cloud and client while harnessing every Windows device to achieve more. At Microsoft, we are reimagining what’s possible by bringing powerful AI compute directly to Windows devices, unlocking a new era of intelligence that runs where you are. With groundbreaking advancements in silicon, a modernized software stack and deep OS integration, Windows 11 is transforming into the world’s most open and capable platform for local AI. Today we are excited to share that Windows ML is now generally available for production use to assist developers with deploying production experiences in the evolving AI landscape. First introduced at Build 2025, Windows ML is the built-in AI inferencing runtime optimized for on-device model inference and streamlined model dependency management across CPUs, GPUs and NPUs, serving as the foundation for Windows AI Foundry and utilized by Foundry Local to enable expanded silicon support which is being released today. By harnessing the power of CPUs, GPUs and NPUs from our vibrant silicon partner ecosystem and building on ONNX’s strong momentum, Windows ML empowers developers to deliver real-time, secure and efficient AI workloads — right on the device. This ability to run models locally enables developers to build AI experiences that are more responsive, private and cost-effective, reaching users across the broadest range of Windows hardware. https://youtu.be/Mow9UY_9Ab4

Bring your own model and deploy efficiently across silicon – securely and locally on Windows

Windows ML is compatible with ONNX Runtime (ORT), allowing developers to utilize familiar ORT APIs and enabling easy transition for existing production workloads. Windows handles distribution and maintenance of ORT and the Execution Providers, taking that responsibility on from the App Developer. Execution Providers (EPs) are the bridge between the core runtime and the powerful and diverse silicon ecosystem, enabling independent optimization of model execution on the different chips from AMD, Intel, NVIDIA and Qualcomm. With ONNX as its model format, Windows ML integrates smoothly with current models and workflows. Developers can easily use their existing ONNX models or convert and optimize their source PyTorch models through the AI Toolkit for VS Code and deploy across Windows 11 PCs. [caption id="attachment_57579" align="alignnone" width="1024"] Windows ML Stack Diagram[/caption] While AI developers work with various models, Windows ML acts as a hardware abstraction layer offering several benefits:

Simplified Deployment: Our infrastructure APIs allow developers to support various hardware architectures without multiple app builds by leveraging execution providers available on the device or by dynamically downloading them. Developers also have the flexibility to precompile their models ahead-of-time (AOT) for a streamlined end-user experience.

Reduce App Overhead: Windows ML automatically detects the user’s hardware and downloads the appropriate execution providers, eliminating the need to bundle the runtime or EPs in a developer’s application. This streamlined approach saves developers tens to hundreds of megabytes in app size when targeting a broad range of devices.

Compatibility: Through collaboration with our silicon partners, Windows ML aims to maintain conformance and compatibility, supporting ongoing updates while ensuring model accuracy across different builds through a certification process.

Advanced Silicon Targeting: Developers can assign device policies to optimize for low power (NPU), high performance (GPU) or specify the silicon used for a model.

For a more technical deep dive on Windows ML, learn more here.

Windows ML, optimized for the latest hardware in collaboration with our silicon partners

Windows 11 has a diverse hardware ecosystem that includes AMD, Intel, NVIDIA and Qualcomm and spans the CPU, GPU and NPU. Consumers can choose from a range of Windows PCs and this variety empowers developers to create innovative local AI experiences. We worked closely with our silicon partners to ensure that Windows ML can fully leverage their latest CPUs, GPUs and NPUs for AI workloads. The way this works is silicon partners build and maintain execution providers that Windows ML distributes, manages, and registers to run AI workloads performantly on-device, serving as a hardware abstraction layer for developers and a way to get optimal performance for each specific silicon. AMD has integrated Windows ML support across their Ryzen AI platform, enabling developers to harness the power of AMD silicon via AMD’s dedicated Vitis AI execution provider on NPU, GPU and CPU. Learn more. “By integrating Windows ML support across our Ryzen AI platform, AMD is making it easier for developers to harness the combined power of our CPUs, GPUs and NPUs. Together with Microsoft, we’re enabling scalable, efficient and high-performance AI experiences that run seamlessly across the Windows ecosystem.” - John Rayfield, corporate vice president, Computing and Graphics Group, AMD Intel’s EP combines OpenVINO AI software performance and efficiency with Windows ML, empowering AI developers to easily choose the optimal XPU (CPU, GPU or NPU) for their AI workloads on Intel Core Ultra processor powered PCs. Learn more. “Intel’s collaboration with Microsoft on Windows ML* empowers developers to effortlessly deploy their custom AI models and applications across CPUs, GPUs and NPUs on Intel’s AI-powered PCs. With the OpenVINO framework, Windows ML* accelerates the delivery of cutting-edge AI applications, enabling faster innovation with unmatched efficiency unlocking the full potential of Intel Core Ultra processors.” - Sudhir Tonse Udupa, vice president, AI PC Software Engineering, Intel NVIDIA’s TensorRT for RTX EP enables AI models to be executed on NVIDIA GeForce RTX and RTX PRO GPUs using NVIDIA’s dedicated Tensor Core libraries for maximum performance. This lightweight EP generates optimized inference engines — instructions on how to run the AI model — for the system’s specific RTX GPU. Learn more. “Windows ML with TensorRT for RTX delivers over 50% faster inferencing on NVIDIA RTX GPUs compared to DirectML in an easy-to-deploy package, enabling developers to scale generative AI across over 100 million Windows devices. This combination of speed and reach empowers developers to create richer AI experiences for Windows users.” - Jason Paul, vice president, Consumer AI, NVIDIA Qualcomm Technologies and Microsoft worked together to optimize Windows ML AI models and apps for the Snapdragon X Series NPU using the Qualcomm Neural Network Execution Provider (QNN EP) as well as GPU and CPU through integration with ONNX Runtime EPs. Learn more here. "With Windows ML now live and the preview of Foundry local, this is a pivotal moment for AI developers on Windows. The new Windows ML runtime not only delivers cutting-edge on-device inference but also simplifies deployment, enabling developers to fully harness advanced AI processors on Snapdragon X Series platforms. Its unified framework and support for NPUs, GPUs and CPUs ensure exceptional performance and efficiency across Snapdragon Windows PCs. As agentic AI experiences become mainstream, our deep collaboration with Microsoft is accelerating innovation and bringing the best AI experiences to Windows Copilot+ PCs and soon to our next-generation Snapdragon X2 platform.” - Upendra Kulkarni, VP, Product Management, Qualcomm Technologies, Inc.

Enabling local AI in the Windows software ecosystem

While developing Windows ML, we prioritized feedback from app developers building AI-powered features. We previously worked with app developers to test the integration with Windows ML during public preview. Leading software app developers such as Adobe, BUFFERZONE, Dot Inc., McAfee, Reincubate, Topaz Labs and Wondershare are among many others working on adopting Windows ML in their upcoming releases, accelerating the proliferation of local AI capabilities across a broad spectrum of applications. By leveraging Windows ML, our software partners can focus on building unique AI-powered features without worrying about hardware differences. Their early adoption and feedback show strong momentum toward local AI, enabling faster development and unlocking new local AI experiences across a variety of use cases:

Adobe Premiere Pro and Adobe After Effects – accelerated semantic search of content in the media library, tagging audio segments by type, and detecting scene edits, all powered by local NPU in upcoming releases; with plans to progressively migrate the full library of existing on-device models to Windows ML.

BUFFERZONE enables real-time secure web page analysis, protecting users from phishing and fraud without sending sensitive data to the cloud.

Camo by Reincubate leverages real-time image segmentation and other ML techniques to improve webcam video quality when streaming and presenting while using the NPU across all silicon providers.

Dot Vista by Dot Inc. supports hands-free voice control and optical character recognition (OCR) for accessibility scenarios, including deployments in healthcare environments using NPUs in Copilot+ PCs.
Filmora by Wondershare uses AI-powered body effects optimized for NPU acceleration on AMD, Intel and Qualcomm platforms, including real-time preview and application of Body effects such as Lightning Twined, Neon Ring and Particle Surround.
McAfee uses automatic detection of deepfake videos and other scam vectors that can be encountered on social networks.

Topaz Photo by Topaz Labs is a professional-grade image enhancement application that lets photographers sharpen details, restore focus and adjust levels on every shot they take - all powered by AI.

Simplified tooling for Windows ML

Developers can take advantage of Windows ML by starting with a robust set of tools for simplified model deployment. AI Toolkit for VS Code provides powerful tools for model and app preparation, including ONNX conversion from PyTorch, quantization, optimization, compilation and evaluation – all in one place. These features make it easier to prepare and deploy efficient models with Windows ML, eliminating the need for multiple builds and complex logic. Starting today, developers can also try custom AI models with Windows ML in AI Dev Gallery, which offers an interactive workspace to make it easier to discover and experiment AI-powered scenarios using local models.

Get started today

With Windows ML now generally available, Windows 11 provides a local AI inference framework that’s ready for production apps. Windows ML is included in the Windows App SDK (starting with version 1.8.1) and supports all devices running Windows 11 24H2 or newer. To get started developing with Windows ML:

Update your project to use the latest Windows App SDK
Call the Windows ML APIs to initialize EPs, and then load any ONNX model and start inferencing in just a few lines of code. For detailed tutorials, API reference and sample code, visit ms/TryWinML
For interactive samples of custom AI models with Windows ML, try the AI Dev Gallery at ms/ai-dev-gallery

Develop local AI solutions with Windows ML

Windows development has always been about enabling developers to do more with software and hardware. Windows ML lets both new and experienced developers build AI-powered apps easily, focusing on innovation and reducing app size. We at Microsoft are excited to see what new experiences you will create using Windows ML across Windows 11 PCs. The era of intelligent, AI-enhanced Windows apps is here – and it’s available to every developer. Let’s usher in this new wave of innovation together with Windows ML!

Editor’s note — September 24, 2025 — Updated to reflect announcements at Qualcomm's Snapdragon Summit on Sept. 24 and correct link for McAfee.

Extending the Reach of Windows ML and DirectML

stefaniesaf — Wed, 18 Mar 2020 17:30:38 +0000

Since the initial release, Windows ML has powered numerous Machine Learning (ML) experiences on Windows. Delivering reliable, high-performance results across the breadth of Windows hardware, Windows ML is designed to make ML deployment easier, allowing developers to focus on creating innovative applications. Windows ML is built upon ONNX Runtime to provide a simple, model-based, WinRT API optimized for Windows developers. This API enables you to take your ONNX model and seamlessly integrate it into your application to power ML experiences. Layered below the ONNX Runtime is the DirectML API for cross-vendor hardware acceleration. DirectML is part of the DirectX family and provides full control for real-time, performance-critical scenarios. This end-to-end stack provides developers with the ability to run inferences on any Windows device, regardless of the machine’s hardware configuration, all from a single and compatible codebase.

Figure 1 – The Windows AI Platform stack

Windows ML is used in a variety of real-world application scenarios. The Windows Photos app uses it to help organize your photo collection for an easier and richer browsing experience. The Windows Ink stack uses Windows ML to analyze your handwriting, converting ink strokes into text, shapes, lists and more. Adobe Premier Pro offers a feature that will take your video and crop it to the aspect ratio of your choice, all while preserving the important action in each frame. With the next release of Windows 10, we are continuing to build on this momentum and are further expanding to support more exciting and unique experiences. The interest and engagement from the community provided valuable feedback that allowed us to focus on what our customers need most. Today, we are pleased to share with you some of that important feedback and how we are continually working to build from it.

Bringing Windows ML and DirectML to More Places

Today, Windows ML is fully supported as a built-in Windows component on Windows 10 version 1809 (October 2018 Update) and newer. Developers can use the corresponding Windows Software Development Kit (SDK) and immediately begin leveraging Windows ML in their application. For developers that want to continue using this built-in version, we will continue to update and innovate Windows ML and provide you with the feature set and performance you need with each new Windows release. A common piece of feedback we’ve heard is that developers today want the ability to ship products and applications that have feature parity to all of their customers. In other words, developers want to leverage Windows ML on applications targeting older versions of Windows and not just the most recent. To support this, we are going to make Windows ML available as a stand-alone package that can be shipped with your application. This redistributable path enables Windows ML support for CPU inference on Windows versions 8.1 and newer, and GPU hardware-acceleration on Windows 10 1709 and newer. Going forward, with each new update of Windows ML, there will be a corresponding redist package, with matching new features and optimizations, available on GitHub. Developers will find that with either option they choose, they will receive an official Windows offering that is extensively tested, guaranteeing reliability and high performance.

Windows ML, ONNX Runtime, and DirectML

In addition to bringing Windows ML support to more versions of Windows, we are also unifying our approach with Windows ML, ONNX Runtime, and DirectML. At the core of this stack, ONNX Runtime is designed to be a cross-platform inference engine. With Windows ML and DirectML, we build around this runtime to offer a rich set of features and hardware scaling, designed for Windows and the diverse hardware ecosystem. We understand the complexities developers face in building applications that offer a great customer experience, while also reaching their wide customer base. In order to provide developers with the right flexibility, we are bringing the Windows ML API and a DirectML execution provider to the ONNX Runtime GitHub project. Developers can now choose the API set that works best for their application scenarios and still benefit from DirectML's high-performance and consistent hardware acceleration across the breadth of devices supported in the Windows ecosystem. In GitHub today, the Windows ML and DirectML preview is available as source, with instructions and samples on how to build it, as well as a prebuilt NuGet package for CPU deployments. [wb_blockquote]Are you a Windows app developer that needs a friendly WinRT API that will integrate easily with your other application code and is optimized for Windows devices? Windows ML is a perfect choice for that. Do you need to build an application with a single code-path that can work across other non-Windows devices? The ONNX Runtime cross-platform C API can provide that.[/wb_blockquote]

Figure 2 – newly layered Windows AI and ONNX Runtime

Developers already using the ONNX Runtime C-API and who want to check out the DirectML EP (Preview) can follow these steps.

Experience it for yourself

We are already making great progress on these new features. You can get access to the preview of Windows ML and DirectML for the ONNX Runtime here. We invite you to join us on GitHub and provide feedback at AskWindowsML@microsoft.com. The official Windows ML redistributable package will be available on NuGet in May 2020. As always, we greatly appreciate all the support from the developer community. We'll continue to share updates as we make more progress with these upcoming features.Since the initial release, Windows ML has powered numerous Machine Learning (ML) experiences on Windows. Delivering reliable, high-performance results across the breadth of Windows hardware, Windows ML is designed to make ML deployment easier, allowing developers to focus on creating innovative applications. Windows ML is built upon ONNX Runtime to provide a simple, model-based, WinRT API optimized for Windows developers. This API enables you to take your ONNX model and seamlessly integrate it into your application to power ML experiences. Layered below the ONNX Runtime is the DirectML API for cross-vendor hardware acceleration. DirectML is part of the DirectX family and provides full control for real-time, performance-critical scenarios. This end-to-end stack provides developers with the ability to run inferences on any Windows device, regardless of the machine’s hardware configuration, all from a single and compatible codebase.

Figure 1 – The Windows AI Platform stack

Bringing Windows ML and DirectML to More Places

Windows ML, ONNX Runtime, and DirectML

Figure 2 – newly layered Windows AI and ONNX Runtime

Developers already using the ONNX Runtime C-API and who want to check out the DirectML EP (Preview) can follow these steps.