Open main menu
Article
Quizzes
Tools
EN
Article
Quizzes
Tools
All quizzes
/
AI APIs & SDKs
/
Why does s...
Why does streaming improve perceived latency even though total generation time is unchanged?
Because users see the first tokens almost immediately (low time-to-first-token), so they start reading while the rest is still being generated
Because streaming compresses the response so fewer tokens need to be transmitted
Because the model runs faster when streaming mode is enabled
Because streaming bypasses the API rate limiter, allowing faster responses
Submit answers