GPT OSS 120B

GPT OSS 120B API

Released August 2024256k Tokens context121.7B Params parameters

Documentation

GPT OSS 120B API enables Autonomous agents and multi-step reasoning, Advanced function calling and workflow orchestration, Research-grade problem solving and planning, Enterprise automation across verticals, Large-scale code generation and debugging, R&D assistance and scientific exploration, Conversational AI and smart copilots, Knowledge extraction and document understanding, Long-context business intelligence and analytics, and Custom fine-tuning for domain-specific performance. Introducing gpt-oss-120B, OpenAI's flagship open-weight model in the gpt-oss series, built for advanced reasoning, large-scale agentic workloads, and enterprise-grade automation. With 120B parameters and a highly optimized Mixture-of-Experts (MoE) architecture, it activates 12B parameters during inference, delivering exceptional intelligence while maintaining competitive latency. Designed for complex reasoning, multi-task agents, and long-horizon planning, gpt-oss-120B brings frontier-level capability to commercial and self-hosted deployments. Standout strengths include High-capacity MoE design for strong reasoning and generalization and Optimized activation load for high throughput (12B active parameters). It is optimized for production agent and assistant workloads where response quality, latency, and predictable operating cost all matter.

from openai import OpenAI # Initialize the OpenAI client with Qubrid base URL client = OpenAI( base_url="https://platform.qubrid.com/v1", api_key="QUBRID_API_KEY", ) stream = client.chat.completions.create( model="openai/gpt-oss-120b", messages=[ { "role": "user", "content": "Explain quantum computing in simple terms" } ], max_tokens=4096, temperature=0.7, top_p=1, stream=True ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) print("\n")

Serverless

API access

INPUT$0.15 /1M
OUTPUT$0.61 /1M
Deploy using API

Dedicated

Cloud GPU VM

Price starts at$1.25 / GPU/ hr
Deploy with GPU VM

Interactive

Playground

INPUT$0.15 /1M
OUTPUT$0.61 /1M
Chat in Playground

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid enabled us to deploy production AI agents with reliable tool-calling and step tracing. We now ship agents faster with full visibility into every decision and API call."

AI Agents Team

Agent Systems & Orchestration