Skip to main content

personal_project

experiment

ToolOrchestra — Nemotron-8B

Ran NVIDIA's orchestration model that beats GPT-5 at a third of the cost.

Builder·2025

// what problem this solves

Multi-step agentic tasks that require planning, tool selection, and execution chains are expensive when you're routing everything through frontier models. GPT-4 and Claude are powerful but slow and costly at scale — especially for orchestration tasks that don't actually need their full capability.

// what I built

A local implementation of NVIDIA's ToolOrchestra framework running Nemotron-8B — a model specifically RL-trained on tool-use and multi-step orchestration. The setup coordinates multiple tools and sub-agents without touching a paid API, and benchmarks show it outperforming GPT-4 on orchestration tasks at roughly a third of the cost.

// how it works

Nemotron-8B was fine-tuned with reinforcement learning specifically on tool-use traces — not just instruction following. That means it understands when to call a tool, what arguments to pass, and how to chain outputs into next steps without needing to be prompted with explicit chain-of-thought scaffolding. Running it locally via HuggingFace removes the per-token cost entirely. The ToolOrchestra framework handles the execution loop and tool registry.

// result

  • Outperforms GPT-4 on multi-tool orchestration benchmarks
  • ~30% of the cost of equivalent frontier model usage
  • Runs entirely local — zero API cost per inference
  • RL-trained on tool-use traces for native orchestration capability

the stack

Nemotron-8BToolOrchestraPythonHuggingFace