Skip to main content

Model overview

Our reasoning models are built for deep analytical work โ€” logical reasoning, math, coding, and long-running agent tasks.

Models

Step 3.7 Flash

Recommended. Our flagship multimodal reasoning model. Building on the high-throughput reasoning and tool-calling capabilities of step-3.5-flash, it adds native multimodal input โ€” understanding images and videos directly, without an additional vision MCP or auxiliary model. Powered by a 198B-parameter / 11B-activation sparse MoE architecture and offering three reasoning effort levels (low / medium / high). A fast and dependable model for agent, coding, and multimodal workloads. 256K context. View detailed documentation โ†’

Step 3.5 Flash

Text-only reasoning. Our flagship language reasoning model. It delivers top-tier reasoning quality and fast, reliable execution โ€” decomposing and planning complex tasks, and reliably orchestrating tool calls. Suitable for logical reasoning, math, software engineering, deep research, and other complex workloads. 256K context.

Context length

Context length is how much input a model can โ€œlook backโ€ and consider when generating a response. A longer context lets the model use more history, improving coherence and accuracy. The limit applies to both input and output โ€” total tokens (not characters) cannot exceed the modelโ€™s context window.
ModelContext length
Step 3.7 Flash256K
Step 3.5 Flash256K

Quickstart

Reasoning model best practices

See recommended prompting and usage patterns for complex reasoning workloads.

Migrate from OpenAI

Switch existing OpenAI-compatible integrations to Stepfun with minimal code changes.

Multi-turn conversations

Store message history and pass context back to the model for continuous dialogue.

JSON Mode

Return machine-parseable JSON so model output can plug into application logic.

Streaming responses

Stream tokens to the UI as they are generated for a faster perceived response.

Tool Call

Let the model invoke tools and external systems to complete real tasks.

Prompt cache

Reuse repeated context to reduce cost and improve latency in repeated requests.