Microsoft Introduces OpenAI’s Smallest Open Model (gpt‑oss‑120b and gpt‑oss‑20b) for Windows Users.

Microsoft Introduces OpenAI’s Smallest Open Model for Windows Users.
Introduction to the New Open‑Weight Models
OpenAI has placed its latest open‑weight models, gpt‑oss‑120b and gpt‑oss‑20b, onto Microsoft’s AI platforms. Developers can now pull these models into Azure AI Foundry or run them directly on Windows devices with Foundry Local.
The move gives full control over the model files, allowing tweaks, fine‑tuning, and deployment without a cloud‑only lock‑in.
Both models are built for real‑world workloads, delivering strong reasoning or tool‑use while staying efficient enough for single‑GPU or on‑device inference.
Read more about the launch on the official blog here.
Why Open‑Weight Matters
- Full visibility into each parameter.
- Ability to apply LoRA, QLoRA, or other parameter‑efficient methods.
- Easier to compress, quantize, or prune for memory‑constrained environments.
- Supports exporting to ONNX or Triton for containerized serving.
These benefits translate into faster iteration cycles. Teams that have tried the models report checkpoint updates in hours instead of weeks.
Azure AI Foundry Overview
Azure AI Foundry acts as a one‑stop shop for building, tuning, and serving AI agents.
Core Features
- Model catalog – Over 11 000 entries, now including gpt‑oss‑120b and gpt‑oss‑20b.
- Training pipelines – Managed compute for fine‑tuning with LoRA, QLoRA, and PEFT.
- Secure serving – Low‑latency endpoints protected by Azure’s compliance stack.
Getting Started in Azure
- Open Azure Cloud Shell.
- Run
az findry model create --name gpt-oss-120b --sku Standard_ND96asr_v4. - Deploy the endpoint with a single CLI call.
The process finishes in minutes, depending on network speed.
Windows AI Foundry and Foundry Local
Windows AI Foundry extends the same capabilities to personal computers.
Foundry Local runs as a lightweight service that downloads the best‑fit binary for the hardware present.
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 (x64), Windows 11 (x64/ARM) | macOS, Windows Server 2025 |
| RAM | 8 GB | 16 GB |
| Disk | 3 GB free | 15 GB free |
| GPU (optional) | Any recent NVIDIA/AMD/Intel | NVIDIA 2 000‑series+, AMD 6 000‑series+, Intel iGPU, Qualcomm Snapdragon X Elite, Apple silicon |
| Admin rights | Yes | Yes |
Installation Steps
- Windows – Open PowerShell and execute
winget install Microsoft.FoundryLocal. - macOS – Run
brew tap microsoft/foundrylocalthenbrew install foundrylocal.
For those who prefer a manual download, the installer is on the project’s GitHub page.
## Running Your First Model
After the service starts, open a terminal and type:
foundry model run phi-3.5-mini
The model will download, then you can ask simple questions:
Why does leaf fall?
The reply appears directly in the console.
To swap in another model, replace phi-3.5-mini with any catalog entry, for example gpt-oss-20b.
Using gpt‑oss‑20b on a Local Machine
Running gpt‑oss‑20b requires a GPU with at least 16 GB VRAM and Foundry Local version 0.6.87 or newer.
foundry model run gpt-oss-20b
Check the installed version with foundry --version.
If the command fails, verify the GPU driver and ensure the CUDA toolkit is present.
Fine‑Tuning and Optimization
Open‑weight models invite developers to adapt them for niche domains.
- Parameter‑efficient tuning – Apply LoRA adapters in minutes.
- Quantization – Reduce precision to 4‑bit for faster inference on edge devices.
- Distillation – Create a smaller student model that mimics the large teacher.
- Structured sparsity – Trim unused weights to meet strict memory limits.
All these steps are supported through Azure AI Foundry pipelines or locally via the Foundry CLI (foundry model fine‑tune, foundry model quantize).
Managing the Local Cache
Downloaded models sit in a cache folder on the device.
foundry cache list
foundry cache clean --max-size 10GB
Regular cleaning prevents disk bloat, especially when testing multiple variants.
Upgrading and Uninstalling
Staying current ensures compatibility with the newest model releases.
- Upgrade on Windows –
winget upgrade --id Microsoft.FoundryLocal. - Upgrade on macOS –
brew upgrade foundrylocal.
To remove the service:
- Windows –
winget uninstall Microsoft.FoundryLocal. - macOS –
brew rm foundrylocal && brew untap microsoft/foundrylocal && brew cleanup --scrub.
Real‑World Use Cases
Enterprise Knowledge Assistant
A large retailer integrated gpt‑oss‑120b into their internal search platform.
The model answered policy questions in under two seconds, reducing support tickets by 30 %.
Edge Device Automation
A robotics startup deployed gpt‑oss‑20b on Windows laptops mounted on autonomous drones.
The model executed planning commands without contacting the cloud, preserving bandwidth.
Academic Research
A university lab used the open weights to explore bias mitigation.
They rewrote attention layers and published a paper on transparent LLM adjustments.
Security and Governance
Both Azure AI Foundry and Foundry Local ship with built‑in content safety modules.
- Input filtering blocks disallowed requests.
- Audit logs record every inference call for compliance tracking.
Open models also allow manual inspection of attention maps, supporting independent security reviews.
Community and Support
The Foundry ecosystem includes a vibrant developer forum, GitHub issues, and a dedicated tech‑community page.
New contributors can submit pull requests for model wrappers or sample pipelines.
Pricing Snapshot
- Azure AI Foundry usage follows the standard Managed Compute rates – see the pricing page here.
- Foundry Local itself is free; only hardware and electricity costs apply.
All pricing reflects August 2025 rates.
Tips for a Smooth Experience
- Verify GPU drivers are up‑to‑date.
- Keep the CLI version aligned with the model catalog.
- Use the
foundry model listcommand to see current compatibility. - Reserve at least 15 GB of disk space for caching.
- Test with a small model (phi‑3.5‑mini) before pulling the 20 B variant.
Future Directions
Open‑weight releases are expected to continue, expanding the catalog beyond language models to vision and multimodal systems.
Microsoft plans tighter integration with Windows AI Foundry, allowing seamless switching between cloud and edge at runtime.
Quick Reference Cheat Sheet
| Action | Command (Windows) | Command (macOS) |
|---|---|---|
| Install Foundry Local | winget install Microsoft.FoundryLocal | brew tap microsoft/foundrylocal && brew install foundrylocal |
| Run phi‑3.5‑mini | foundry model run phi-3.5-mini | Same |
| Run gpt‑oss‑20b | foundry model run gpt-oss-20b | Same |
| List models | foundry model list | Same |
| Check version | foundry --version | Same |
| Upgrade | winget upgrade --id Microsoft.FoundryLocal | brew upgrade foundrylocal |
| Uninstall | winget uninstall Microsoft.FoundryLocal | brew rm foundrylocal && brew untap microsoft/foundrylocal |
Conclusion
Open‑weight models on Azure and Windows give developers the freedom to experiment, customize, and ship AI solutions without compromising on performance or security.
The combination of Azure AI Foundry’s managed services and Foundry Local’s on‑device runtime creates a flexible stack that fits any workflow.
If you have a GPU‑enabled Windows machine or access to Azure compute, you can start today by installing Foundry Local and pulling the gpt‑oss models.
Additional Resources
- Official blog post announcing the models link
- Foundry Local getting‑started guide link
- Azure AI Foundry model catalog page
- GitHub repository for Foundry Local (search “Microsoft/FoundryLocal”)
Explore, tweak, and deploy – the new era of open AI is already here.
…Mastering B2B Social Selling: The Complete Guide to Relationship-Driven Revenue Growth
–The Simple Online Method for Unlimited Passive Income
–How to Write Better AI Prompts, According to Anthropic