Skip to main content
Call the Chat Completions API to get the model-generated chat response data.

Endpoint

POST https://api.stepfun.ai/v1/chat/completions

Request Parameters

  • model string required
    Name of the model to use
  • messages object array required
    List of different categories of messages entered by the user or generated by the model so far
  • tools object array optional
    List of functions supported for Toolcall
  • max_tokens int optional
    Maximum number of tokens to generate for the chat. Default is INF (no limit, determined automatically by the model). The total of input and generated tokens is limited by the specified model’s maximum context length.
  • temperature float optional
    Sampling temperature, a number between 0.0 and 2.0. Higher values (e.g., 0.8) make the output more random; lower values (e.g., 0.2) make it more focused and deterministic. Default is 0.5.
  • top_p float optional
    Top-p sampling. The model generates tokens within the probability mass of top_p and outputs them. Default is 0.9.
  • n int optional
    Controls how many response messages the model generates for each input message. Default is 1; no hard max, but suggested up to 5.
  • stream bool optional
    Whether to stream the response messages. Default is false.
  • stop string | string array optional
    Stops generation when encountering any of the stop content. Empty by default.
  • frequency_penalty float optional
    Default is 0. A number between 0.0 and 1.0. Higher values penalize tokens that have appeared frequently in the generated text, reducing repetition.
  • response_format object optional
    Used to instruct the model to output in a specific format. Default is {"type":"text"}, meaning text output. Set { "type": "json_object" } to turn on JSON Mode and output parseable JSON.
  • reasoning_format string optional
    Used to instruct the model which reasoning field to use when outputting; default is general, meaning general reasoning that returns a reasoning field. Options are [general,deepseek-style]. When set to deepseek-style, you can use the DeepSeek-compatible reasoning_content field to obtain the reasoning content.
  • reasoning_effort string optional
    Controls how much reasoning the model performs. Models that support three reasoning tiers accept low, medium, high; step-3.5-flash-2603 accepts low and high only. Higher values produce deeper reasoning but may take longer to respond.

Response format

Non-streaming response

When stream=false (default), the API returns a single Chat Completion object.

Attributes

  • id string
    Chat response ID.
  • object string
    Response object type, always chat.completion.
  • model string
    Model name.
  • created timestamp
    Unix timestamp (seconds) when the response was generated.
  • choices object array
    List of response choices.
  • usage object
    Token usage statistics.

Example

{
  "id": "b7b56af0-52a6-483f-a589-948182676a1b",
  "object": "chat.completion",
  "created": 1709893411,
  "model": "step-3.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! StepFun specializes in artificial intelligence and provides AI solutions across NLP, computer vision, and machine learning to help customers improve efficiency and create value across industries."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 83, "completion_tokens": 176, "total_tokens": 259 }
}

Streaming response

When stream=true, the API streams a series of chat.completion.chunk events.

Attributes

  • id string
    Generated chat response ID.
  • object string
    Response object type, always chat.completion.chunk.
  • created timestamp
    Unix timestamp (seconds) when the chunk was generated.
  • model string
    Model name.
  • choices object array
    List of streamed response alternatives. Each choice object:
  • usage object
    Token usage statistics.

Example

data: {"id":"d7ae7c4a-1524-4fe5-9d58-e4d59b89d8f0","object":"chat.completion.chunk","created":1709899323,"model":"step-3.5-flash","choices":[{"index":0,"delta":{"role":"","content":"Hello"},"finish_reason":""}],"usage":{"prompt_tokens":83,"completion_tokens":1,"total_tokens":84}}

...

data: {"id":"d7ae7c4a-1524-4fe5-9d58-e4d59b89d8f0","object":"chat.completion.chunk","created":1709899323,"model":"step-3.5-flash","choices":[{"index":0,"delta":{"role":"","content":"value"},"finish_reason":""}],"usage":{"prompt_tokens":83,"completion_tokens":148,"total_tokens":231}}

data: {"id":"d7ae7c4a-1524-4fe5-9d58-e4d59b89d8f0","object":"chat.completion.chunk","created":1709899323,"model":"step-3.5-flash","choices":[{"index":0,"delta":{"role":"","content":""},"finish_reason":"stop"}],"usage":{"prompt_tokens":83,"completion_tokens":150,"total_tokens":233}}

data: [DONE]

Examples

Note: The basic examples below use step-3.5-flash by default. To use the three reasoning tiers, see the step-3.7-flash call in the “Reasoning Effort” example.
curl https://api.stepfun.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $STEP_API_KEY" \
  -d '{
    "model": "step-3.7-flash",
    "messages": [
      {
        "role": "user",
        "content": "Explain what reinforcement learning is in three sentences."
      }
    ],
    "reasoning_effort": "medium",
    "max_tokens": 1024
  }'