Capacity Planner
Estimate deployment capacity and growth boundaries for context, concurrency, and resource planning.
Prompt / Concurrency Capacity Planner
Evaluate deployment feasibility across prompt-token and concurrency combinations for a selected model, quantization, and GPU.
GPU memory per card48 GB
Feasible combinations0
Largest feasible caseNone
| Concurrency \ Prompt tokens | 4096 | 8192 | 16384 | 32768 | 65536 | 131072 |
|---|---|---|---|---|---|---|
| 1 | Overflow 450.25 per GPU Overflow 402.3 GiB | Overflow 451.04 per GPU Overflow 403 GiB | Overflow 452.58 per GPU Overflow 404.6 GiB | Overflow 456.22 per GPU Overflow 408.2 GiB | Overflow 461.34 per GPU Overflow 413.3 GiB | Overflow 469.32 per GPU Overflow 421.3 GiB |
| 2 | Overflow 451.15 per GPU Overflow 403.1 GiB | Overflow 452.34 per GPU Overflow 404.3 GiB | Overflow 454.63 per GPU Overflow 406.6 GiB | Overflow 459.7 per GPU Overflow 411.7 GiB | Overflow 467.67 per GPU Overflow 419.7 GiB | Overflow 481.31 per GPU Overflow 433.3 GiB |
| 4 | Overflow 452.95 per GPU Overflow 404.9 GiB | Overflow 454.95 per GPU Overflow 406.9 GiB | Overflow 458.74 per GPU Overflow 410.7 GiB | Overflow 466.68 per GPU Overflow 418.7 GiB | Overflow 480.33 per GPU Overflow 432.3 GiB | Overflow 505.28 per GPU Overflow 457.3 GiB |
| 8 | Overflow 456.55 per GPU Overflow 408.6 GiB | Overflow 460.15 per GPU Overflow 412.1 GiB | Overflow 466.95 per GPU Overflow 418.9 GiB | Overflow 480.63 per GPU Overflow 432.6 GiB | Overflow 505.64 per GPU Overflow 457.6 GiB | Overflow 553.21 per GPU Overflow 505.2 GiB |
| 16 | Overflow 463.74 per GPU Overflow 415.7 GiB | Overflow 470.57 per GPU Overflow 422.6 GiB | Overflow 483.38 per GPU Overflow 435.4 GiB | Overflow 508.53 per GPU Overflow 460.5 GiB | Overflow 556.27 per GPU Overflow 508.3 GiB | Overflow 649.09 per GPU Overflow 601.1 GiB |