Model Catalog

One API for all models. Search our library, deploy and run inference on NVIDIA GPUs in seconds

Unlock $1 free API credit on first recharge - generate up to ~4M tokens

Qubrid AI Model Catalog
Qwen/Qwen3.5-122B-A10B
Vision

Qwen/Qwen3.5-122B-A10B

Qwen3.5-122B-A10B is the most powerful open-source model in the Qwen3.5 Medium Series. With 122B total parameters and 10B active per token across a 48-layer hybrid architecture, it delivers the strongest knowledge, vision, and function-calling performance in the medium class — scoring 86.6% on GPQA Diamond (beating GPT-5 mini's 82.8%), 72.2% on BFCL-V4 tool calling (vs GPT-5 mini's 55.5%), 92.1% on OCRBench, and 83.9% on MMMU. Supports text, image, and video input natively via early fusion.

Qwen/Qwen3.5-397B-A17B
Vision

Qwen/Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is Alibaba's flagship open-source model — the first in the Qwen3.5 series (released February 16, 2026) and the most capable open-weight model in the family. It is a native multimodal model trained from scratch on trillions of text, image, and video tokens using early fusion across 201 languages. With 397B total parameters and 17B active per token, it outperforms all Qwen3-VL models on vision tasks while matching or exceeding text-only frontiers. The hosted version is called Qwen3.5-Plus.

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Chat

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Nemotron 3 Nano 30B-A3B is NVIDIA's flagship open reasoning model, featuring a revolutionary hybrid Mamba-Transformer Mixture-of-Experts architecture. With 31.6B total parameters but only 3.2B active per forward pass, it delivers up to 3.3× higher throughput than comparable models while achieving state-of-the-art accuracy on reasoning, coding, and agentic benchmarks. The model supports up to 1M token context length and features configurable reasoning depth with thinking budget control.

openai/gpt-oss-120b
Chat

openai/gpt-oss-120b

Introducing gpt-oss-120B, OpenAI's flagship open-weight model in the gpt-oss series, built for advanced reasoning, large-scale agentic workloads, and enterprise-grade automation. With 120B parameters and a highly optimized Mixture-of-Experts (MoE) architecture, it activates 12B parameters during inference, delivering exceptional intelligence while maintaining competitive latency. Designed for complex reasoning, multi-task agents, and long-horizon planning, gpt-oss-120B brings frontier-level capability to commercial and self-hosted deployments.

p-image-edit-lora
Image

p-image-edit-lora

P-Image Edit LoRA combines the speed and quality of P-Image Edit with the flexibility of Low-Rank Adaptation (LoRA) fine-tuning. This enables custom style transfer, character consistency, brand-specific aesthetics, and domain-specific edits using pre-trained LoRA weights from HuggingFace. The model maintains sub-second inference while applying custom adaptations, making it ideal for production workflows requiring consistent styling or specialized editing capabilities.

Sign up to get $1.00 free API credit. Test out the latest models now.

Access enterprise-grade open-source AI models including Llama 3, DeepSeek, Qwen, and more via our high-performance serverless API. Experience low-latency inference on the latest NVIDIA GPUs optimized for production workloads.

"Qubrid AI reduced our document processing time by over 60% and significantly improved retrieval accuracy across our RAG workflows."

Enterprise AI Team

Document Intelligence Platform