openai/gpt-oss-120b

Introducing gpt-oss-120B, OpenAI's flagship open-weight model in the gpt-oss series, built for advanced reasoning, large-scale agentic workloads, and enterprise-grade automation. With 120B parameters and a highly optimized Mixture-of-Experts (MoE) architecture, it activates 12B parameters during inference, delivering exceptional intelligence while maintaining competitive latency. Designed for complex reasoning, multi-task agents, and long-horizon planning, gpt-oss-120B brings frontier-level capability to commercial and self-hosted deployments.

OpenAI Chat 256k Tokens
Get API Key
Try in Playground
Free Trial Credit No Credit Card Required
$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "openai/gpt-oss-120b",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "stream": true,
  "top_p": 1
}'

Technical Specifications

Model Architecture & Performance

Model Size 121.7B Params
Context Length 256k Tokens
Quantization fp16
Tokens/Second 389
Architecture Large-Scale Mixture-of-Experts (MoE) with adaptive routing, SwiGLU activations, hierarchical sparse attention, and token-choice MoE for reasoning efficiency
Precision Native support for FP4 (MXFP4), FP8, BF16, and tensor parallel inference
License Apache 2.0
Release Date August 2024
Developers OpenAI

Pricing

Pay-per-use, no commitments

Input Tokens $0.15/1M Tokens
Output Tokens $0.61/1M Tokens

API Reference

Complete parameter documentation

Parameter Type Default Description
stream boolean true Enable streaming responses for real-time output.
temperature number 0.7 Controls randomness. Higher values mean more creative but less predictable output.
max_tokens number 4096 Maximum number of tokens to generate in the response.
top_p number 1 Nucleus sampling: considers tokens with top_p probability mass.
effort select medium Controls how much reasoning effort the model should apply.
summary select concise Controls the level of explanation in the reasoning summary.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths Considerations
High-capacity MoE design for strong reasoning and generalization
Optimized activation load for high throughput (12B active parameters)
State-of-the-art performance under native FP4 and FP8 quantization
Scales across multi-GPU clusters and distributed inference setups
Up to 256K context window with efficient sparse attention
Superior agentic and planning abilities for sequential decision tasks
Built-in support for structured schema-based function calling
Apache 2.0 license enabling commercial and derivative use
Higher compute and memory requirements compared to smaller gpt-oss models
Latency may increase on single-GPU deployments
Fine-tuning recommended for highly specialized enterprise domains

Use cases

Recommended applications for this model

Autonomous agents and multi-step reasoning
Advanced function calling and workflow orchestration
Research-grade problem solving and planning
Enterprise automation across verticals
Large-scale code generation and debugging
R&D assistance and scientific exploration
Conversational AI and smart copilots
Knowledge extraction and document understanding
Long-context business intelligence and analytics
Custom fine-tuning for domain-specific performance

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid enabled us to deploy production AI agents with reliable tool-calling and step tracing. We now ship agents faster with full visibility into every decision and API call."

AI Agents Team

Agent Systems & Orchestration