Model overview
Our reasoning models are built for deep analytical work โ logical reasoning, math, coding, and long-running agent tasks.Models
Step 3.7 Flash
Recommended. Our flagship multimodal reasoning model. Building on the high-throughput reasoning and tool-calling capabilities ofstep-3.5-flash, it adds native multimodal input โ understanding images and videos directly, without an additional vision MCP or auxiliary model. Powered by a 198B-parameter / 11B-activation sparse MoE architecture and offering three reasoning effort levels (low / medium / high). A fast and dependable model for agent, coding, and multimodal workloads. 256K context.
View detailed documentation โ
Step 3.5 Flash
Text-only reasoning. Our flagship language reasoning model. It delivers top-tier reasoning quality and fast, reliable execution โ decomposing and planning complex tasks, and reliably orchestrating tool calls. Suitable for logical reasoning, math, software engineering, deep research, and other complex workloads. 256K context.Context length
Context length is how much input a model can โlook backโ and consider when generating a response. A longer context lets the model use more history, improving coherence and accuracy. The limit applies to both input and output โ total tokens (not characters) cannot exceed the modelโs context window.| Model | Context length |
|---|---|
| Step 3.7 Flash | 256K |
| Step 3.5 Flash | 256K |
Quickstart
Reasoning model best practices
See recommended prompting and usage patterns for complex reasoning workloads.
Migrate from OpenAI
Switch existing OpenAI-compatible integrations to Stepfun with minimal code changes.
Multi-turn conversations
Store message history and pass context back to the model for continuous dialogue.
JSON Mode
Return machine-parseable JSON so model output can plug into application logic.
Streaming responses
Stream tokens to the UI as they are generated for a faster perceived response.
Tool Call
Let the model invoke tools and external systems to complete real tasks.
Prompt cache
Reuse repeated context to reduce cost and improve latency in repeated requests.