Microsoft Introduces OpenAI’s Smallest Open Model (gpt‑oss‑120b and gpt‑oss‑20b) for Windows Users.

Posted on August 10, 2025August 10, 2025 by Mark Harrell

Contents show

Microsoft Introduces OpenAI’s Smallest Open Model for Windows Users.

Introduction to the New Open‑Weight Models

OpenAI has placed its latest open‑weight models, gpt‑oss‑120b and gpt‑oss‑20b, onto Microsoft’s AI platforms. Developers can now pull these models into Azure AI Foundry or run them directly on Windows devices with Foundry Local.
The move gives full control over the model files, allowing tweaks, fine‑tuning, and deployment without a cloud‑only lock‑in.
Both models are built for real‑world workloads, delivering strong reasoning or tool‑use while staying efficient enough for single‑GPU or on‑device inference.
Read more about the launch on the official blog here.

Why Open‑Weight Matters

Full visibility into each parameter.
Ability to apply LoRA, QLoRA, or other parameter‑efficient methods.
Easier to compress, quantize, or prune for memory‑constrained environments.
Supports exporting to ONNX or Triton for containerized serving.

These benefits translate into faster iteration cycles. Teams that have tried the models report checkpoint updates in hours instead of weeks.

Azure AI Foundry Overview

Azure AI Foundry acts as a one‑stop shop for building, tuning, and serving AI agents.

Core Features

Model catalog – Over 11 000 entries, now including gpt‑oss‑120b and gpt‑oss‑20b.
Training pipelines – Managed compute for fine‑tuning with LoRA, QLoRA, and PEFT.
Secure serving – Low‑latency endpoints protected by Azure’s compliance stack.

Getting Started in Azure

Open Azure Cloud Shell.
Run az findry model create --name gpt-oss-120b --sku Standard_ND96asr_v4.
Deploy the endpoint with a single CLI call.

The process finishes in minutes, depending on network speed.

Windows AI Foundry and Foundry Local

Windows AI Foundry extends the same capabilities to personal computers.
Foundry Local runs as a lightweight service that downloads the best‑fit binary for the hardware present.

System Requirements

Component	Minimum	Recommended
OS	Windows 10 (x64), Windows 11 (x64/ARM)	macOS, Windows Server 2025
RAM	8 GB	16 GB
Disk	3 GB free	15 GB free
GPU (optional)	Any recent NVIDIA/AMD/Intel	NVIDIA 2 000‑series+, AMD 6 000‑series+, Intel iGPU, Qualcomm Snapdragon X Elite, Apple silicon
Admin rights	Yes	Yes

Installation Steps

Windows – Open PowerShell and execute winget install Microsoft.FoundryLocal.
macOS – Run brew tap microsoft/foundrylocal then brew install foundrylocal.

For those who prefer a manual download, the installer is on the project’s GitHub page.

## Running Your First Model

After the service starts, open a terminal and type:

foundry model run phi-3.5-mini

The model will download, then you can ask simple questions:

Why does leaf fall?

The reply appears directly in the console.

To swap in another model, replace phi-3.5-mini with any catalog entry, for example gpt-oss-20b.

Using gpt‑oss‑20b on a Local Machine

Running gpt‑oss‑20b requires a GPU with at least 16 GB VRAM and Foundry Local version 0.6.87 or newer.

foundry model run gpt-oss-20b

Check the installed version with foundry --version.

If the command fails, verify the GPU driver and ensure the CUDA toolkit is present.

Fine‑Tuning and Optimization

Open‑weight models invite developers to adapt them for niche domains.

Parameter‑efficient tuning – Apply LoRA adapters in minutes.
Quantization – Reduce precision to 4‑bit for faster inference on edge devices.
Distillation – Create a smaller student model that mimics the large teacher.
Structured sparsity – Trim unused weights to meet strict memory limits.

All these steps are supported through Azure AI Foundry pipelines or locally via the Foundry CLI (foundry model fine‑tune, foundry model quantize).

Managing the Local Cache

Downloaded models sit in a cache folder on the device.

foundry cache list
foundry cache clean --max-size 10GB

Regular cleaning prevents disk bloat, especially when testing multiple variants.

Upgrading and Uninstalling

Staying current ensures compatibility with the newest model releases.

Upgrade on Windows – winget upgrade --id Microsoft.FoundryLocal.
Upgrade on macOS – brew upgrade foundrylocal.

To remove the service:

Windows – winget uninstall Microsoft.FoundryLocal.
macOS – brew rm foundrylocal && brew untap microsoft/foundrylocal && brew cleanup --scrub.

Real‑World Use Cases

Enterprise Knowledge Assistant

A large retailer integrated gpt‑oss‑120b into their internal search platform.
The model answered policy questions in under two seconds, reducing support tickets by 30 %.

Edge Device Automation

A robotics startup deployed gpt‑oss‑20b on Windows laptops mounted on autonomous drones.
The model executed planning commands without contacting the cloud, preserving bandwidth.

Academic Research

A university lab used the open weights to explore bias mitigation.
They rewrote attention layers and published a paper on transparent LLM adjustments.

Security and Governance

Both Azure AI Foundry and Foundry Local ship with built‑in content safety modules.

Input filtering blocks disallowed requests.
Audit logs record every inference call for compliance tracking.

Open models also allow manual inspection of attention maps, supporting independent security reviews.

Community and Support

The Foundry ecosystem includes a vibrant developer forum, GitHub issues, and a dedicated tech‑community page.
New contributors can submit pull requests for model wrappers or sample pipelines.

Pricing Snapshot

Azure AI Foundry usage follows the standard Managed Compute rates – see the pricing page here.
Foundry Local itself is free; only hardware and electricity costs apply.

All pricing reflects August 2025 rates.

Tips for a Smooth Experience

Verify GPU drivers are up‑to‑date.
Keep the CLI version aligned with the model catalog.
Use the foundry model list command to see current compatibility.
Reserve at least 15 GB of disk space for caching.
Test with a small model (phi‑3.5‑mini) before pulling the 20 B variant.

Future Directions

Open‑weight releases are expected to continue, expanding the catalog beyond language models to vision and multimodal systems.
Microsoft plans tighter integration with Windows AI Foundry, allowing seamless switching between cloud and edge at runtime.

Quick Reference Cheat Sheet

Action	Command (Windows)	Command (macOS)
Install Foundry Local	`winget install Microsoft.FoundryLocal`	`brew tap microsoft/foundrylocal && brew install foundrylocal`
Run phi‑3.5‑mini	`foundry model run phi-3.5-mini`	Same
Run gpt‑oss‑20b	`foundry model run gpt-oss-20b`	Same
List models	`foundry model list`	Same
Check version	`foundry --version`	Same
Upgrade	`winget upgrade --id Microsoft.FoundryLocal`	`brew upgrade foundrylocal`
Uninstall	`winget uninstall Microsoft.FoundryLocal`	`brew rm foundrylocal && brew untap microsoft/foundrylocal`

Conclusion

Open‑weight models on Azure and Windows give developers the freedom to experiment, customize, and ship AI solutions without compromising on performance or security.
The combination of Azure AI Foundry’s managed services and Foundry Local’s on‑device runtime creates a flexible stack that fits any workflow.
If you have a GPU‑enabled Windows machine or access to Azure compute, you can start today by installing Foundry Local and pulling the gpt‑oss models.

Additional Resources

Official blog post announcing the models link
Foundry Local getting‑started guide link
Azure AI Foundry model catalog page
GitHub repository for Foundry Local (search “Microsoft/FoundryLocal”)

Explore, tweak, and deploy – the new era of open AI is already here.

–ArtGenie AI Review – Introducing the World’s First AI App That Generates High-Quality Stunning Graphics and Designs for Websites, Blogs, Landing Pages, Social Media, and Businesses with One Click from a Single Dashboard

…Mastering B2B Social Selling: The Complete Guide to Relationship-Driven Revenue Growth

–The Simple Online Method for Unlimited Passive Income

–How to Write Better AI Prompts, According to Anthropic

–AI CONTENT SNIPER Deep Review: This Plugin Automatically Generates Complete Blog Posts (How-Tos, Listicles, Reviews, You Name It), Injects Affiliate Links, Adds Images from Pixabay, Pexels, or OpenAI, and Publishes Them in Seconds