personal_project
experimentToolOrchestra — Nemotron-8B
Ran NVIDIA's orchestration model that beats GPT-5 at a third of the cost.
// what problem this solves
Multi-step agentic tasks that require planning, tool selection, and execution chains are expensive when you're routing everything through frontier models. GPT-4 and Claude are powerful but slow and costly at scale — especially for orchestration tasks that don't actually need their full capability.
// what I built
A local implementation of NVIDIA's ToolOrchestra framework running Nemotron-8B — a model specifically RL-trained on tool-use and multi-step orchestration. The setup coordinates multiple tools and sub-agents without touching a paid API, and benchmarks show it outperforming GPT-4 on orchestration tasks at roughly a third of the cost.
// how it works
Nemotron-8B was fine-tuned with reinforcement learning specifically on tool-use traces — not just instruction following. That means it understands when to call a tool, what arguments to pass, and how to chain outputs into next steps without needing to be prompted with explicit chain-of-thought scaffolding. Running it locally via HuggingFace removes the per-token cost entirely. The ToolOrchestra framework handles the execution loop and tool registry.
// result
- Outperforms GPT-4 on multi-tool orchestration benchmarks
- ~30% of the cost of equivalent frontier model usage
- Runs entirely local — zero API cost per inference
- RL-trained on tool-use traces for native orchestration capability
the stack