temperature | number | no | 0.7 | Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.95 | Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1 |
max_tokens | integer | no | 4096 | Maximum output tokens. · Range: 1 – 32768 |
stop | string | no | — | Up to 4 strings where the model will stop generating further tokens. |
reasoning_effort | enum | no | "medium" | Reasoning effort. none disables thinking; low, medium, high, and max set bounded thinking budgets. · Allowed: none, low, medium, high, max |
enable_thinking | boolean | no | true | Enable the model reasoning channel before final output. |
thinking_budget | integer | no | 4096 | Maximum thinking tokens before the final answer. If max_tokens is lower, the service reserves room for the answer. · Range: 1024 – 32768 |
top_k | integer | no | 20 | Limit sampling to the top K candidate tokens when supported. · Range: 1 – 200 |
min_p | number | no | 0 | Minimum probability threshold for token sampling. · Range: 0 – 1 |
presence_penalty | number | no | 0 | Penalty for tokens that already appeared in the generated text. · Range: -2 – 2 |
frequency_penalty | number | no | 0 | Penalty based on how often a token has already appeared. · Range: -2 – 2 |
repetition_penalty | number | no | 1 | Penalty used by SGLang to reduce repeated text. · Range: 0.1 – 2 |
seed | integer | no | — | Optional random seed for reproducible sampling. · Range: 0 – 2147483647 |
logprobs | boolean | no | false | Return token log probabilities when supported. |
top_logprobs | integer | no | — | Return up to this many top token log probabilities. · Range: 0 – 20 |
logit_bias | object | no | — | Bias token IDs by adding positive or negative values before sampling. |
tools | array | no | — | OpenAI-compatible function tool definitions. |
tool_choice | object | no | — | OpenAI-compatible function tool selection. |
response_format | object | no | — | Structured JSON output instructions. |
stream | boolean | no | false | Stream response deltas using server-sent events. |