Skip to main content
StepFun provides developers with voice interaction models that support audio generation and voice cloning. By integrating these models, applications can extend beyond standard large language model understanding and enable voice interaction.

Quick Start

Quickly Generate an Audio Clip

Copy the following code to quickly generate an audio file.
curl --location 'https://api.stepfun.ai/v1/audio/speech' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $STEP_API_KEY" \
--data '{
   "model":"step-tts-2",
   "input":"StepFun is building the next generation of AGI.",
   "voice":"lively-girl"
}'\
--output "step.mp3"

Voice Recommendations by Scenario

StepFun offers dozens of recommended voices across seven major scenarios. You can preview different voices here and use them via the API. We strongly recommend using voice cloning to create custom voices. The step-tts-2 model delivers industry-leading cloning performance, and cloned voices support all emotion and style controls at zero additional cost.

1. Marketing

Marketing scenarios require voices with charisma, persuasiveness, and warmth that can effectively convey product value and inspire purchase intent. Step-TTS delivers full emotional expression, building trust and professionalism to make marketing content more compelling.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Lively Breezylivelybreezy-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Upright YouthzhengpaiqingnianSample 1 ยท Sample 2

2. Customer Service

Customer service scenarios require voices that are warm, patient, and professional, capable of calming users and providing clear solutions. We offer two types of customer service voices โ€” step-tts-2 voices stand out with rich audio quality, full emotion, and a lifelike human feel, making the first four recommendations especially suited for phone scenarios.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Straightforward MaleshuangkuainanshengSample 1 ยท Sample 2 ยท Sample 3
stepaudio-2.5-tts / step-tts-2Capable FemaleganliannvshengSample 1 ยท Sample 2 ยท Sample 3
stepaudio-2.5-tts / step-tts-2Warm FemaleqinhenvshengSample 1 ยท Sample 2 ยท Sample 3
stepaudio-2.5-tts / step-tts-2Energetic FemalehuolinvshengSample 1 ยท Sample 2 ยท Sample 3
stepaudio-2.5-tts / step-tts-2Elegant Gentleelegantgentle-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Lively Breezylivelybreezy-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Gentle MalewenrounanshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Classic FemalejingdiannvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Mature GentlewenroushunvSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Sweet FemaletianmeinvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Pure GirlqingchunshaonvSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Spirited MaleyuanqinanshengSample 1 ยท Sample 2

3. Audiobook

Audiobooks require voices that are expressive and emotionally engaging, capable of vividly bringing different characters and story atmospheres to life. Our TTS stands out with its delicate emotional expression and versatile vocal styles, enabling listeners to fully immerse themselves in the world of the story.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Lively Girllively-girlSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Scholarly GentlemanruyananshiSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Gentle FemalewenrounvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Tender GentlemanwenrougongziSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Magnetic MalecixingnanshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Spirited GirlyuanqishaonvSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Upright YouthzhengpaiqingnianSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Spirited MaleyuanqinanshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Broadcast MaleboyinnanshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Deep MaleshenchennanyinSample 1 ยท Sample 2

4. Emotional Companionship

Emotional companionship requires voices that are warm, gentle, and empathetic, capable of providing users with comfort and psychological support. Our TTS features delicate, soothing voice timbres with strong emotional expressiveness, helping you create a safe and comforting interaction environment for users.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Soft-spoken Gentlemansoft-spoken-gentlemanSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Elegant Gentleelegantgentle-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Lively Breezylivelybreezy-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Gentle MalewenrounanshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Tender GentlemanwenrougongziSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Classic FemalejingdiannvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Friendly FemaleqinqienvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Sweet FemaletianmeinvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Magnetic MalecixingnanshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Spirited GirlyuanqishaonvSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Girl Next DoorlinjiajiejieSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Scholarly GentlemanruyananshiSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Deep MaleshenchennanyinSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Gentle FemalewenrounvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Cute Soft FemaleruanmengnvshengSample 1 ยท Sample 2

5. Voice Assistant

Voice assistant scenarios require voices that are clear, natural, and efficient, capable of accurately understanding and responding to user commands. Our TTS features natural prosody and full emotional expression, making your voice assistant both professional and approachable.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Elegant Gentleelegantgentle-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Lively Breezylivelybreezy-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Pure GirlqingchunshaonvSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Spirited GirlyuanqishaonvSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Girl Next DoorlinjiajiejieSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Scholarly GentlemanruyananshiSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Clever GirljilingshaonvSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Cute Soft FemaleruanmengnvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Kid SisterlinjiameimeiSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Intellectual LadyzhixingjiejieSample 1 ยท Sample 2

6. Video Dubbing

Video dubbing requires voices that are expressive, rhythmic, and visually evocative, capable of blending seamlessly with visual content. Our TTS excels in precise emotional delivery and fine-grained speech rhythm control, enhancing the impact and overall appeal of your videos.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Vibrant Youthvibrant-youthSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Magnetic-voiced Malemagnetic-voiced-maleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Girl Next DoorlinjiajiejieSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Kid SisterlinjiameimeiSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2College StudentqingniandaxueshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Cute Soft FemaleruanmengnvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Elegant FemaleyouyanvshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Cool BeautylengyanyujieSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Intellectual LadyzhixingjiejieSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Bold SistershuangkuaijiejieSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Quiet ScholarwenjingxuejieSample 1 ยท Sample 2

7. Education & Training

Education and training scenarios require voices that are clear, accurate, and inspiring, capable of effectively conveying knowledge and sparking learning interest. Our TTS excels at capturing the vocal characteristics of instructors across different emotional states.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Elegant Gentleelegantgentle-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Gentle MalewenrounanshengSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Lively Breezylivelybreezy-femaleSample 1 ยท Sample 2
stepaudio-2.5-tts / step-tts-2Mature GentlewenroushunvSample 1 ยท Sample 2

System Voice ID List

Voice NameVoice IDSupported ModelsRecommended Use Cases
Vibrant Youthvibrant-youthstepaudio-2.5-tts, step-tts-2Audiobook, video dubbing
Lively Girllively-girlstepaudio-2.5-tts, step-tts-2Audiobook, video dubbing
Soft-spoken Gentlemansoft-spoken-gentlemanstepaudio-2.5-tts, step-tts-2Emotional companionship, audiobook
Magnetic-voiced Malemagnetic-voiced-malestepaudio-2.5-tts, step-tts-2Audiobook, video dubbing
Confident Malezixinnanshengstepaudio-2.5-tts, step-tts-2Audiobook, emotional companionship, education, marketing
Elegant Gentleelegantgentle-femalestepaudio-2.5-tts, step-tts-2Customer service, voice-over, education, emotional companionship
Lively Breezylivelybreezy-femalestepaudio-2.5-tts, step-tts-2Emotional companionship, customer service, education, marketing
Gentle Malewenrounanshengstepaudio-2.5-tts, step-tts-2Voice-over, emotional companionship, customer service, education
Tender Gentlemanwenrougongzistepaudio-2.5-tts, step-tts-2Emotional companionship, audiobook
Spirited Maleyuanqinanshengstepaudio-2.5-tts, step-tts-2Audiobook, voice-over, customer service
Classic Femalejingdiannvshengstepaudio-2.5-tts, step-tts-2Customer service, emotional companionship
Mature Gentlewenroushunvstepaudio-2.5-tts, step-tts-2Customer service, voice-over, education
Sweet Femaletianmeinvshengstepaudio-2.5-tts, step-tts-2Emotional companionship, customer service
Pure Girlqingchunshaonvstepaudio-2.5-tts, step-tts-2Customer service, voice assistant
Magnetic Malecixingnanshengstepaudio-2.5-tts, step-tts-2Audiobook, emotional companionship
Spirited Girlyuanqishaonvstepaudio-2.5-tts, step-tts-2Audiobook, emotional companionship, voice assistant
Girl Next Doorlinjiajiejiestepaudio-2.5-tts, step-tts-2Voice-over, emotional companionship, voice assistant, video dubbing
Upright Youthzhengpaiqingnianstepaudio-2.5-tts, step-tts-2Marketing, audiobook
College Studentqingniandaxueshengstepaudio-2.5-tts, step-tts-2Voice-over
Broadcast Maleboyinnanshengstepaudio-2.5-tts, step-tts-2Audiobook, voice-over
Scholarly Gentlemanruyananshistepaudio-2.5-tts, step-tts-2Audiobook, emotional companionship, voice-over, voice assistant
Deep Maleshenchennanyinstepaudio-2.5-tts, step-tts-2Emotional companionship, audiobook
Friendly Femaleqinqienvshengstepaudio-2.5-tts, step-tts-2Voice-over
Gentle Femalewenrounvshengstepaudio-2.5-tts, step-tts-2Audiobook, emotional companionship
Clever Girljilingshaonvstepaudio-2.5-tts, step-tts-2Voice assistant, voice-over
Cute Soft Femaleruanmengnvshengstepaudio-2.5-tts, step-tts-2Emotional companionship, voice assistant, video dubbing
Elegant Femaleyouyanvshengstepaudio-2.5-tts, step-tts-2Video dubbing
Cool Beautylengyanyujiestepaudio-2.5-tts, step-tts-2Video dubbing
Bold Sistershuangkuaijiejiestepaudio-2.5-tts, step-tts-2Voice-over
Quiet Scholarwenjingxuejiestepaudio-2.5-tts, step-tts-2Voice-over
Kid Sisterlinjiameimeistepaudio-2.5-tts, step-tts-2Video dubbing, voice-over, voice assistant
Intellectual Ladyzhixingjiejiestepaudio-2.5-tts, step-tts-2Video dubbing, voice-over, voice assistant
Straightforward Maleshuangkuainanshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant
Capable Femaleganliannvshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant
Warm Femaleqinhenvshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant
Energetic Femalehuolinvshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant

Voice Tags List

Voice tags support three categories: speaking style, emotion, and language. Emotion tags must be set in the voice_label.emotion field, while speaking-style tags must be set in the voice_label.style field.
stepaudio-2.5-tts does NOT support voice tags. Use the instruction parameter for emotion and style control instead.
No.Tag NameTag Typestep-tts-2
1HappyEmotionโœ“
2Very HappyEmotionโœ“
3SadEmotionโœ“
4AngryEmotionโœ“
5Very AngryEmotionโœ“
6CoquettishEmotionโœ“
7SlowSpeaking Styleโœ“
8Very SlowSpeaking Styleโœ“
9FastSpeaking Styleโœ“
10Very FastSpeaking Styleโœ“
11FearfulEmotionโœ“
12SurprisedEmotionโœ“
13ExcitedEmotionโœ“
14AdmiringEmotionโœ“
15ConfusedEmotionโœ“
16ColdDelivery Styleโœ“
17EmbarrassedDelivery Styleโœ“
18FrustratedDelivery Styleโœ“
19ProudDelivery Styleโœ“
20TenderDelivery Styleโœ“
21SweetDelivery Styleโœ“
22OutgoingDelivery Styleโœ“
23SeriousDelivery Styleโœ“
24ArrogantDelivery Styleโœ“
25ElderlyDelivery Styleโœ“
26ShoutingDelivery Styleโœ“
27SarcasticDelivery Styleโœ“
28StutteringDelivery Styleโœ“

Output Format

StepFun TTS models support audio output in wav, mp3, flac, opus, and pcm formats. The default format is mp3. You can choose the format that best suits your use case.

Output Languages

StepFun TTS models support generating audio in Chinese, English, mixed Chinese-English, and Japanese.

FAQ

Do I own the audio I generate? Yes. You own the audio you create. However, we recommend informing your end users that the audio was generated by AI so they are aware of its nature. How do I adjust the volume of the generated audio? You can set the volume parameter when calling the generation API. Valid values range from 0.1 to 2.0, representing 10% volume to 200% volume. How do I adjust the speaking rate of the generated audio? You can set the speed parameter when calling the generation API. Valid values range from 0.5 to 2.0, representing half-speed to double-speed.