Skip to content
Home/Online Tools/Capacity Planner

Capacity Planner

Estimate deployment capacity and growth boundaries for context, concurrency, and resource planning.

Prompt / Concurrency Capacity Planner

Evaluate deployment feasibility across prompt-token and concurrency combinations for a selected model, quantization, and GPU.

GPU memory per card48 GB
Feasible combinations0
Largest feasible caseNone
Concurrency \ Prompt tokens40968192163843276865536131072
1
Overflow
450.25 per GPU
Overflow 402.3 GiB
Overflow
451.04 per GPU
Overflow 403 GiB
Overflow
452.58 per GPU
Overflow 404.6 GiB
Overflow
456.22 per GPU
Overflow 408.2 GiB
Overflow
461.34 per GPU
Overflow 413.3 GiB
Overflow
469.32 per GPU
Overflow 421.3 GiB
2
Overflow
451.15 per GPU
Overflow 403.1 GiB
Overflow
452.34 per GPU
Overflow 404.3 GiB
Overflow
454.63 per GPU
Overflow 406.6 GiB
Overflow
459.7 per GPU
Overflow 411.7 GiB
Overflow
467.67 per GPU
Overflow 419.7 GiB
Overflow
481.31 per GPU
Overflow 433.3 GiB
4
Overflow
452.95 per GPU
Overflow 404.9 GiB
Overflow
454.95 per GPU
Overflow 406.9 GiB
Overflow
458.74 per GPU
Overflow 410.7 GiB
Overflow
466.68 per GPU
Overflow 418.7 GiB
Overflow
480.33 per GPU
Overflow 432.3 GiB
Overflow
505.28 per GPU
Overflow 457.3 GiB
8
Overflow
456.55 per GPU
Overflow 408.6 GiB
Overflow
460.15 per GPU
Overflow 412.1 GiB
Overflow
466.95 per GPU
Overflow 418.9 GiB
Overflow
480.63 per GPU
Overflow 432.6 GiB
Overflow
505.64 per GPU
Overflow 457.6 GiB
Overflow
553.21 per GPU
Overflow 505.2 GiB
16
Overflow
463.74 per GPU
Overflow 415.7 GiB
Overflow
470.57 per GPU
Overflow 422.6 GiB
Overflow
483.38 per GPU
Overflow 435.4 GiB
Overflow
508.53 per GPU
Overflow 460.5 GiB
Overflow
556.27 per GPU
Overflow 508.3 GiB
Overflow
649.09 per GPU
Overflow 601.1 GiB